Istore: PostgreSQL Documents for Analytical Workloads

Inspired by the PostgreSQL key/value data-type hstore, we developed the istore extension with
support for operators like + and aggregates like SUM for semi-structured integer-based data.

While the hstore allows arbitrary textual-data as its keys and values, in an istore document
both keys and values are represented and stored as integers. Therefore istore fits nicely in an
analytical workload. User journeys, cohort or funnel data, distributional data and many other
scenarios can be efficiently modeled and stored in PostgreSQL using istore.

The extension comes with two data types: istore and bigistore, the former having int and the
latter bigint as values; keys being int for both. This article demonstrates the efficiency of
istore and some of its applications through two examples - aggregating logs and analyzing event
funnels.

This creates a sample table with 5M rows. In each row, you store the information that on day X
in segment Y, the event Z hit count number of times, and this brought revenue amount of
revenue.

Let’s now say you want to look at the hit-counts per event ID and revenue per event ID
distributions for each (date, segment) pair.

You could define a table with two istore fields per (date, segment). The first field would have event
IDs as keys and hit-counts as values, and the other field would have event IDs as keys and revenue as values:

SQL aggregation and division of istore documents

Typically, you’d be interested in aggregated
distributions for all event IDs instead of just a single event ID. Let’s say you want the revenue per single event-hit for each
event ID. With the non-istore setup, you could write:

This illustrates how you can use the SQL SUM aggregate-function to perform aggregations on
istore data. The result from the SUM application would be an istore with event IDs as keys and the revenues and counts as values, respectively. The istore / istore division operator will subsequently
result in an istore with event IDs as keys and the desired ratios as values.

If you prefer the result as a set instead of an istore, you can simply apply the each(istore)
on the result from the division, the same way you would with an hstore.

Note again the improved efficiency of the istore v.s. non-istore data model.

Filtering istore documents

Suppose you want a report of all segments that triggered event ID 5 at least once, but never triggered
event ID 100.

Using the istore ? integer operator to check if a given key exists in an istore might be
intuitive from using PostreSQL’s hstore or json types. And the compact(istore) function
returns an istore with all pairs with value 0 removed.

Sum up istore values together

Suppose you now need the total count of events hit by all segments in all time.