Django Queries Optimization

Have you ever asKED yourself, is this the best way of doing this query? or is there a more efficient way?

When we have a project with one or more recurrent or long blocking queries we need to take into big consideration that this query might slow down our application or even make the database crash.

Some of this times the trick is to use the available options of the ORM and some times the best way is to write a SQL customized query, but how to know what’s the best in each case?

In this article I’ll show you a series of best practices recommended by Django documentation and by experience so you will have an idea of some of the options out there to optimize your code!

Before deciding what to use you have to know that you may want to optimize the queries for speed or for memory or both, depending on your requirements. But sometimes optimizing for one will affect the other. Also, the tasks that are processed by the database might not have the same resources cost than those made by a Python process. It is up to you to decide what your priorities are and where the balance must lie so you can improve the server resources according to it.

Having the Following Schema

First than nothing you have to understand that Querysets are Lazy, but what does it means?

It means that the act of creating or declaring a Queryset doesn’t involve any database activity. So you can add filters and conditions line after line and the Queryset won’t be executed until you actually ask for the information, for example:

Now in these lines of code, the database gets only one hit in the line print(user_query) which is very useful when you want to add a lot of conditions before the execution but be careful because there are some other methods that might hit the database right away, like:

It also means that you have to distinguish which queries are cached so it won’t hit the database again and which will no matter if the query is repeated for example:

So if you need to iterate over a huge dataset and then perform some action with the values a posible solution is to use an iterator().

It will hit the database just once fetching the matching rows but use iterator() with caution, and make sure that your code is organized to avoid repeated evaluation of the same huge Queryset.

A way to avoid the excess of memory usage when the dataset is huge is to execute a boolean condition before populating the whole dataset.

But don’t abuse of the exists(), count() methods because each time you call them a new hit is performed.

We also could retrieve everything at once if we know we’ll certainly need it, to do this we can use prefetch_related() or select_related() so we can get the related fields in other objects and the correct object all at once.

This will return a Queryset that will automatically retrieve, in a single batch, related objects for each of the specified lookups.

Now you have select_related() that returns a Queryset that will look for the foreign-key relationships, selecting additional related-object data when it executes its query. This is a performance booster which results in a single more complex query but means later use of foreign-key relationships won’t require database queries, for example if you need all the user for the team.

So rather than doing that you can do this:

Another way to make a query faster is to retrieve individual objects using a unique, indexed column. As in any other ORM, the query will be quicker because of the underlying database index. Also, the query runs much slower when there are multiple objects matching the filter; having a unique constraint on the column guarantees this will never happen.

Don’t order results if you don’t care. Sometimes Django orders the dataset even if we didn’t ask for that but we can disable it on a Queryset by calling order_by() with no parameters.

Use defer() and only() if  you only need certain fields on a dataset to avoid loading all of the fields. Note that if you do use them later, the ORM will have to go and get them in a separate query, making this a pessimization if you use it inappropriately.

Logging All SQL Queries

A pretty useful tool is the ORM logger, this will enable you to inspect all of the queries and also get the time of the execution. To enable this you’ll need to add a few things to your LOGGING configuration in your settings.py. First, you’ll need a handler. The example below logs everything at DEBUG level to the console.

Next, you’ll need to add a logger that logs to this handler:

Once the above is set up, you should see a stream of SQL queries in your console output every time you hit the database.

Conclusion

There isn’t a perfect query or a general answer to utilizing the ORM. You should first seek to make your code clear and then work on optimizing it and take into big consideration that some practices may speed up the database processes but spend lots of memory so, be careful and analyze what is the best for your app and your resources.

Related Posts