Sat 02 April 2016

Django and Django REST Framework use SQL COUNT in pagination.
As your database grows SQL COUNT becomes too slow. Fortunately the frameworks
are well designed and allow to customize a way items are count.
Let me illustrate that on a typical "books" example.

It has been working fine for some time but authors are so productive and wrote thousands of books.
As a result alice.books.count() takes a few seconds to execute.
One option is to store books count in Author model which might be helpful in
other SQL queries. Another one is to keep count somewhere else, e.g., Redis.

Note that books_count is nullable due to existing data on production servers.
Setting a default value 0 on schema migration would take long to update
every author record, also it's hard to tell whether an author record is
already migrated (books_count is calculated or not).

Now we have Author.books_count field which should keep track of new books.
You can implement that based on Django signals or SQL triggers.

Signals might fail, even with send_robust. Updating a books count in
a signal receiver slows down an application (two to three DB queries and
acquiring a DB lock on author record):

Moving an update logic from a signal receiver to Celery task doesn't look
promising -- books_count would be inconsistent between a task trigger and
its execution or simply Celery task might fail. Another issue is that
a model signal is not "attached" to DB transaction. For example,
a Celery task is run before a DB transaction is committed.
That is usually quick fixed with task_name.apply_async(countdown=1)
to delay a task execution, but I would rather recommend something like
django-transaction-hooks.

What about SQL trigger option? A bad thing is that a business logic leaks from an app to DB.
But with a good documentation and tests it's maintainable: SQL code is stored in
a data migration module.

You can either explicitly list fields to update author.save(update_fields=['name'])
or use django-save-the-change. Let's document that in the model docstring.

fromsave_the_change.mixinsimportSaveTheChangeclassAuthor(SaveTheChange,models.Model):"""Author, e.g., Terry Pratchett. :attribute books_count: How many books writer has. SQL COUNT is expensive operation, so we store calculated value and update it by SQL trigger (check a data migration module for details). It's important to save only fields that were updated in the model. Otherwise SQL trigger's results are overwritten by Django ORM. For example: 1. author is requested with a lock (books_count = 1) 2. new book is created 3. SQL trigger updates author's books_count field (now it is 2) 4. author instance is saved with the old value of books_count = 1. SaveTheChange mixin helps to prevent it. """

To benefit from books_count field in Django REST Framework we need
a custom pagination class which implements Paginator.count property.
The idea is to extract author ID from paginator's SQL, query a books count
from Author model and return it, instead of default
Book.objects.filter(author_id=author_id).count().

fromdjango.core.paginatorimportPaginatorfromrest_frameworkimportpaginationfromrest_framework.viewsetsimportReadOnlyModelViewSetclassBookViewSet(ReadOnlyModelViewSet):pagination_class=BookPaginationclassBookPagination(pagination.PageNumberPagination):django_paginator_class=CachedBookCountPaginatorclassCachedBookCountPaginator(Paginator):@cached_propertydefcount(self):"""Return the total number of books, across all pages. It parses a SQL and learns what author ID was requested based on ``self.object_list.query``. After that we can get a cached books count from Author model. """# There is query.where, but I could't find an author ID easily.# Moreover query.where internals might be changed.sql=str(self.object_list.query)author_id=self._get_author_id_from_sql(sql)author=Author.objects.get(pk=author_id).only('books_count')# In case we got unsynced author, we fallback to SQL COUNT.ifauthor.books_countisNone:returnself.query_count()returnauthor.books_countdefquery_count(self):"""Request books count from DB. We need this method to facilitate testing (mocks). """returnsuper(CachedBookCountPaginator,self).count@classmethoddef_get_author_id_from_sql(cls,sql):pass