Achieving Correct Bloat Estimates of JSON Data in PostgreSQL

by Oleksandr Shulgin - 13 Nov 2015

Recently my team realized that tables in one of our PostgreSQL databases were showing colossal amounts of bloat: up to 40 out of 54 GB total (75%) for bigger tables, and up to 95% of 17 GB total for smaller ones. Having accurate estimates is important to make certain database administration decisions aimed at reclaiming disk space and improving database performance, but these numbers were something very unusual, so we wanted to investigate first.

For the table and index bloat estimations, we used queries based on the PostgreSQL Wiki’s database-bloat query. Only certain tables were affected, pointing to problems with the tables themselves.

To investigate possible causes for these really unusual bloat estimation numbers, we used the pgstattuple extension, which allows you to do a full table scan and compute the actual summed tuple lengths. This process is slow, so you don’t want to run it repeatedly — but it’s accurate. From it we learned that dead tuples and free space weren’t responsible for our total bloat issues.

We then turned our attention to the actual table schema. Our discovery: All the bloated tables included columns of type JSON.

As you can see from the estimation queries, pg_stats.avg_width is the foundation for calculating average tuple width — and, therefore, for estimating the total size that the relation should (ideally) occupy. For fixed-length types such as int, char(N), etc., the avg_width equals the type length, and thus it’s always accurate. For variable-length types, the avg_width is the average of the tuples’ lengths as examined by the latest ANALYZE. (To generate these statistics, you need at least one successful ANALYZE on the table.)

Checking the avg_width for our JSON columns revealed a surprise: No statistics were recorded for these columns at all! This possibly explained why the estimates for these tables were so off-base. Now the question remained: Why were there no statistics for the JSON-typed columns?

After reading the code in analyze.c in the PostgreSQL source-code, we realized that column statistics are gathered only if an equality operator is defined for the column’s data type. However, no equality operator defined for type JSON actually exists. And for good reason: You probably don’t want whitespace differences to affect the comparison of otherwise equivalent JSON values. The disadvantage: There’s no way to make the comparison efficient. You’d need to parse both JSON strings in order to compare them semantically.

Verifying AssumptionsIn order to verify that statistics aren’t gathered for columns of type JSON, create a test table like this one —

CREATE TABLE testjson( id INT, tx TEXT, js JSON);

— and, after populating it with some data and running ANALYZE, query the pg_statistic system table:

=# SELECT staattnum, stawidth FROM pg_statistic AS s JOIN pg_class AS c ON s.starelid = c.oid WHERE c.relname = 'testjson' ORDER BY 1;

staattnum | stawidth -----------+---------- 1 | 4 2 | 5 (2 rows)

The pg_stats used in the estimation queries is a system view built on top of this table. As you’ll notice, no entry for the third column of type JSON was created.

Post-9.4 versions of PostgreSQL come with type JSONB, which among other interesting properties, has the equality operator defined. In our problem case, the database in question had been upgraded to 9.4. But switching ~100 GB of tables over to a different column type — and that’s only on a single database shard, out of eight — was not a viable option.

If only for the sake of statistics calculation, you can enhance your plain-text JSON type with the equality operator. Just ensure that the operator is exclusively reserved for this particular use case so that your queries produce an error, and not a meaningless result for the accidental JSON comparison.

The Straightforward Way

Create a schema with a descriptive name, and revoke public access to it:

CREATE SCHEMA dontuse;REVOKE USAGE ON SCHEMA dontuse FROM PUBLIC;

Now you need a function to create an operator on top of it. JSON is just a text type with a syntax check on input, so we can cast the parameters to TEXT and compare:

Declare the operator class as the default for this type. Otherwise, the ANALYZE command won’t pick up the columns for which we want statistics.

After re-analyzing the table, you can finally find the missing statistics entry for the column.

Case Closed?

If you now try to analyze a reasonably-populated table (in our test, two million rows), the process will take about 19 seconds(!) — whereas it takes only 200 milliseconds when the js column is of the type TEXT instead of JSON. You can observe this by using the ”ALTER TABLE … ALTER COLUMN … TYPE” command to change the type of the JSON column to TEXT and back. Apparently this penalty is due to the overhead of the SQL function we’ve defined.

— we can reduce the time spent in ANALYZE down to some 9-10 seconds. Not much of a gain, if you consider the runtime on the TEXT column.

Can We Do Better?

Well, there are at least two possibilities: Either write the comparison function in C, taking advantage of the fact that JSON type’s internal representation is the same as that of type TEXT; or force the use of TEXT type’s equality comparison function for the defined operator.

Writing an external function is not hard, but it requires making an external loadable module, installing it on all affected systems, maintaining it, etc.

Using the existing function sounds more promising. Unfortunately, this cannot be done directly with the CREATE OPERATOR command:

What does work, however, is direct modification of the pg_operator catalog entry for our newly added operator. This way, we can avoid the overhead of an SQL function call and two type casts on the input arguments. Don’t try this at home, though:

In the example above, 114 is the OID of type JSON. If you test the ANALYZE command on the test table it should only take around 300 milliseconds now. Great result!? Well, that depends.

Digging Even Deeper

Potential performance problems can result if the STATISTICS setting on the JSON column is increased, or if one bumps up the per-cluster default_statistics_target setting.

If the statistics target changes from 100 (the default) to 1000, for example, the ANALYZE on the test table suddenly takes about 15 seconds again. The ANALYZE algorithm tries to compute the most common values for our column, even though there’s no defined ordering operator (such as <)that could be used to efficiently sort data samples. We can mitigate this effect by using a different access method — for example, btree — when creating the custom operator class:

The exact definition of functions behind the operators = and < is not important here. For the purpose of ANALYZE they will never be called, but both operators are required for ANALYZE to even consider the JSON columns. The function that will be called is the support routine, provided in the last clause of the command above.

If we go the safe route again and write the support function in SQL, we might end up with ~2.5 seconds on our default statistics target of 100. If we go the hacky way and override the support function to use the one provided for type TEXT, we can get the reasonable 300 milliseconds on statistics target 100 — and we’re back to only 2.5 seconds with target 1000.

For completeness, here’s a command that you also should not try at home:

The Final Chord

Since PostgreSQL 9.2, you can provide a second support routine for the btree access method. This will be called only once per sort operation to obtain C-callable comparison functions; if you use it, you might also notice another performance boost. But bttextsortsupport, a function for the TEXT type, doesn’t appear in core PostgreSQL until version 9.5, so you can’t actually use it as in the above example.

Yet another possibility is to provide a custom ANALYZE function for the type. This frees you to do anything, but also poses disadvantages: It must be written in C, and will require you to alter a system catalog (this time for the built-in JSON type itself to set the typanalyze attribute). Also, this hacky catalog change will not make it through pg_upgrade.

Now, the proper fix can only be implemented in PostgreSQL core: even if a type doesn’t provide equality comparison operator, the average column width can still be estimated. The upcoming version of PostgreSQL 9.5 will have the fix incorporated.