Is autovacuum aggressive?

I covered vacuum basics in my earlier post. In this post, I'm covering the importance of aggressive autovacuum. Database Administrators who just started their career might have doubts in their mind like;

1. Can't we just disable autovacuum, to avoid overhead on performance?

2. Why not scheduled vacuum works?

3. Why autovacuum needs aggressive tuning?

Once the above questions solved, will have a new question in mind,

4. Okay, How to tune it aggressively?

Let me clear the above 3, first; In traditional projects, some people feel differently, if it hurts disable it. Then schedule a manual vacuum during nonpeak hours. Once they disable it they can see some improvement in their performance obviously, but only for some couple of minutes or hours based on their load. Later it starts hurting more badly. Disabling autovacuum is just you are welcoming new performance issues, and increasing bloats massively.

Second; By running periodic manual vacuums on idle times, will get best results in idle times only, but in busy times you will face issues.

Third; The PostgreSQL query planner runs based on the system stats, which need to be updated regularly to select the best plan for the queries. If autovacuum well tuned, your planner will get the best plans based on the updated stats, and you can't see performance issues like before.

How to tune?

Basically, autovacuum settings can be tuned in two ways;

1. Global level setting (which can be modified in postgresql.conf or by alter system)

2. Table level settings.

The default autovacuum settings work fine for small and medium sized databases. For larger you need aggressive tuning. If the system has a similar load on all the tables in the database, then global level settings should work. Otherwise, you have to go with table level settings. The basic logic behind choosing the autovacuum triggering is based on the object's update/delete frequency. For instance, if my object has 10k/sec updates or deletes i.e., every minute 600,000 bloats created. We can get this information from the catalog view called "pg_stat_all_tables". Let us assume the autovacuum runs very rarely based on the current settings, which will impact the performance. Because, by the time autovacuum triggered it has more dead tuples to clean, which consumes some time. Of course, autovacuum doesn't lock the object but you can see a little delay in the query execution. If you tune the autovacuum settings to run more often your bloat will clear very fast as the number of dead tuples less, similarly, the stats also be updated more often.

The below autovacuum settings are very crucial.

1. autovacuum_vacuum_threshold

2. autovacuum_vacuum_scale_factor

3. autovacuum_vacuum_cost_delay

4. autovacuum_vacuum_cost_limit

5. autovacuum_max_workers

6. autovacuum_naptime

As mentioned these settings should be proportional to the count of dead tuples. In the next post, I will cover the tuning part. Stay tuned...