Did you know pagination with offset
is very troublesome but easy to avoid?
Offset
instructs the DBMS skip the first N results of a query. However, the database must still fetch these rows from the disk and bring them in order before it can send the following ones.
This is not an implementation problem, it’s the way offset
is designed:
…the rows in the derived table are first sorted according to the <order by clause> and then limited by dropping the number of rows specified in the <result offset clause> from the beginning…
SQL:2023, Part 2, §4.17.3 Derived tables
There crucial point here is that offset
takes only one single parameter: the number of rows to be dropped. No more context. The only thing the database can do with this number is fetching and dropping that many rows. In other words, big offsets
impose a lot of work on the database—no matter whether SQL or NoSQL.
But the trouble with offset
doesn’t stop here: think about what happens if a new row is inserted between fetching two pages?
When using offset
➌ to skip the previously fetched entries❶, you’ll get duplicates in case there were new rows inserted between fetching two pages➋. There are other anomalies possible too, this is just the most common one.
Warning
The idea to use the number of rows seen to skip over them later is simply wrong.0
This is not even a database problem, it is the way frameworks implement pagination: they just say which page number to fetch or how many rows to skip. With this information alone, no database can do any better.
Also note that the offset problem comes in may different syntaxes:
The offset keyword
The 2-parameter
limit [offset,] limit
(the 1 parameterlimit
is fine)Lower-bound filtering based on row numbering (e.g.
row_number()
,rownum
, …).
The root problem all these methods have in common is that they just provide a number of rows to be dropped—no more context. In this article, I’m using offset
to refer to any of these methods.
Life Without OFFSET
Now imagine a world without these problems. As it turns out, living without offset
is quite simple: just use a where
clause that selects only data you haven’t seen yet.
For that, we exploit the fact that we work on an ordered set—you do have an order by
clause, ain’t you? Once there is a definite sort order, we can use a simple filter to only select what follows the entry we have see last:
SELECT ...
FROM ...
WHERE ...
AND id < ?last_seen_id
ORDER BY id DESC
FETCH FIRST 10 ROWS ONLY
This is the basic recipe. It gets more interesting when sorting on multiple columns, but the idea is the same. This recipe is also applicable to many NoSQL systems.
This approach—called seek method or keyset pagination—solves the problem of drifting results as illustrated above and is even faster than offset
. If you’d like to know what happens inside the database when using offset
or keyset pagination, have a look at these slides (benchmarks, benchmarks!):
On slide 43 you can also see that keyset pagination has some limitations: most notably that you cannot directly navigate to arbitrary pages. However, this is not a problem when using infinite scrolling. Showing page number to click on is a poor navigation interface anyway—IMHO.
If you want to read more about how to properly implement keyset pagination in SQL, please read this article. Even if you are not involved with SQL, it’s worth reading that article before starting to implement anything.
But the Frameworks…
The main reason to prefer offset
over keyset pagination is the lack of tool support. Most tools offer pagination based on offset
, but don’t offer any convenient way to use keyset pagination.
Please note that keyset pagination affects the whole technology stack up to the JavaScript running in the browser doing AJAX for infinite scrolling: instead of passing a simple page number to the server, you must pass a full keyset (often multiple columns) down to the server.
The hall of fame of frameworks that do support keyset pagination is constantly growing:
- JavaScript
- Python
- Java / JVM
- PHP
- .NET
- Ruby
- Perl
This is where I need your help. If you are maintaining a framework that is somehow involved with pagination, I ask you, I urge you, I beg you, to build in native support for keyset pagination too. If you have any questions about the details, I’m happy to help (forum, contact form, Twitter)!
Even if you are just using software that should support keyset pagination such as a content management system or webshop, let the maintainers know about it. You might just file a feature request (link to this page) or, if possible, supply a patch. Again, I’m happy to help out getting the details right.