Zonal statistics is a technique to summarize the values of a raster dataset
overlapped by a set of vector geometries.
The analysis can answer queries such as
"Average elevation of each nation park" or "Maximum temperature by state".

My goal in this article is to demonstrate a PostGIS implementation of zonal stats and
compare the results and runtime performance to a reference Python implementation.

Compared to attribute table screenshot above, the results are identical for all columns.
That isn't too surprising given that both approachs use GDAL's rasterization API under the hood.

Performance is a different story. The zonal stats query took 81.90 seconds, roughly 34x slower than the Python code for
the equivalent result.

Thoughts

In terms of the expressiveness of the two approaches, I can see the appeal of both Python code and SQL queries.
Of course this will be personal preference depending on your background and familiarity with the environments.
The Python API hides the implementation details and is more flexible, with more statistics options and rasterization strategies.
But the SQL approach covers the common use case in a declarative query; it exposes the implementation
details yet remains very readable.

The performance impact is significant enough to be a deal breaker for PostGIS.
I haven't delved into the issues too closely;
There might some obvious ways to optimize this query but I haven't found anything as of writing this.
PostGIS experts, please get in touch if you find any speedups that I could consider here!

Performance combined with the additional overhead of managing postgres instances and data imports
tells me that running zonal stats in postgis will not be a great option unless you're already running PostgreSQL.
If your application is already committed to postgres and you want to integrate zonal stats tightly into
your data management strategy, it could be a viable approach.
For example, you could create a TRIGGER or an asyncronous worker via LISTEN/NOTIFY to ensure run zonal statistics is run each time a new feature is inserted into your vector table.

For most other zonal stats use cases, using rasterstats against local files or in-memory Python data will be a faster with less data management overhead.