Tuesday, July 31, 2018

How to paginate faster in PostgreSQL with big offset values

I was surprised to know, how inefficient pagination could be, when its done with LIMIT and OFFSET.

Everything is good and dandy as long as the OFFSET value is in hundreds and you are dealing with relatively smaller dataset. With huge datasets (5-10 Million+ records) the performance degrades pretty fast as the offset values increase.

The Problem

Offset inefficiency creeps in because of the delay incurred by shifting the results by a large offset. Even in the presence of an index, the database must scan through storage to count rows. To utilise an index we would have to filter a column by a value, but in this case we require a certain number of rows irrespective of their column values.

Moreover Rows could be of different size in the storage and some may be marked for deletion, hence the database cannot use simple arithmetic to find a location on disk to begin reading results.

The Solution

Its best to demonstrate the solution with an example. Lets say for e.g. we have a table called "events" with primary key column "id". We are fetching 30 records per page from it and now want to skip 100000 records and get the next 30 records. The query to do this would look like this

This query would be pretty slow because of reasons mentioned above. To get around this problem we can tweak the query as follows and it should start running faster (unbelievably fast).

Reason for this significant improvement in performance is because of the WHERE clause on "id". Database could go right to the given row and than fetch next 30 records!

There you have it, a simple tweak in query could be the difference between a "crawling query" and a "blazing fast query"!
Have some Fun!