I missed it completely, but on 24th of March 2017, Alvaro Herrera committed patch:

Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns. Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too. All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.
This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table. This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve. A new
special pseudo-type pg_ndistinct is used.
(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/.173334..horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)
Author: Tomas Vondra. Some code rework by Álvaro.
Ideriha Takeshi
Discussion: https://postgr.es/m/.4080608@fuzzy.cz
https://postgr.es/m/.ixlaueanxegqd5gr@alvherre.pgsql

Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns. Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too. All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.
This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table. This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve. A new
special pseudo-type pg_ndistinct is used.
(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/.173334..horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)
Author: Tomas Vondra. Some code rework by Álvaro.
Ideriha Takeshi
Discussion: https://postgr.es/m/.4080608@fuzzy.cz
https://postgr.es/m/.ixlaueanxegqd5gr@alvherre.pgsql

So what this is about? Phrase from first line, “Implement multivariate n-distinct coefficients", seems pretty opaque.

In reality it's rather simple. As you know – PostgreSQL chooses what to do with query (or, rather, how to get results) based on some estimates on count of rows that will match certain parts of the query.

So far PostgreSQL used only statistics on single column, and if it had to get estimate based on conditions to two columns, it involved some black magic (well, math, but it's pretty similar to magic).

First query shows clearly improved estimate. Second – not so much. Why? To be honest, no idea. If someone can enlighten me, it would be really great. In any way – these are just first steps into getting proper multi-column statistics, so I'm very happy about it, and enthusiastic about future – even if some bit of the functionality now doesn't seem to work as expected (by me). It's a long road, but I'm very grateful that we started it. Thanks a lot to all involved.

One comment

The multivariate feature in PG 10 only records a single integer indicating the correlation of the columns — it doesn’t record the probability of specific column combinations. That features is hopefully coming in PG 11.

Your second-to-last query hit a common combination (both ending in 1) while your last query has no matching rows (the last digit is different).