Columnar store or vertical representation of data allows to achieve better performance in comparison with classical horizontal representation due to three factors:

Reducing size of fetched data: only columns involved in query are accessed.

Vector operations. Applying an operator to set of values (tile) makes it possible to minimize interpretation cost.
Also SIMD instructions of modern processors accelerate execution of vector operations.

Compression of data. Certainly compression can also be used for all the records, but independent compression of each column can give much better results without significant extra CPU overhead. For example such simple compression algorithm like RLE
(run-length-encoding) allows not only to reduce used space, but also minimize number of performed operations.

There are several database systems based on vertical data model: Vertica, SciDB,... There are also extensions to existed DBMSes, such as
"Oracle In-Memory Option". This plug-in tries to provide such functionality for PostgreSQL.

As it is clear from the abbreviation (IMCS: In-Memory Columnar Store) this plugin adds to PostgreSQL in-memory columnar store.
So vertical representation of data is complementary to standard horizontal representation.
Data is imported in PostgreSQL database in usual way and is stored in normal table.
Then columns from this table are fetched and stored in shared memory. IMCS provides a lot of timeseries functions which can be used for data
analysis. Operations with timeseries are performed in vector mode allowing to reach maximal possible speed of such operations.
Also IMCS makes it possible to parallelize execution of some queries (for example calculation of aggregates) and utilize all CPU cores. All this three factors: in-memory location of data, vector operations, parallel query execution, makes it possible
to increase speed of some queries more than 100 times comparing with standard PostgreSQL queries.

To make access to timeseries as convenient as possible, IMCS provides generator of access functions. You should specify name of source table or view (from which data will be imported), name of timestamp field (this is a main key by which timeseries elements are accessed) and
optionally timeseries identifier. The last one needs some explanations. In some cases all data from the table should be placed in a single timeseries. For example assume that we collect data about phone calls (date, duration, caller, callee,...). It will be a single timeseries. But for example in trading systems there are separate data (ticks) associated with each symbol. So we have separate timeseries for ABB, GOOG, IBM, YHOO,... In this case securities identifier (symbol) can be considered as identifier of timeseries.

IMCS supports the following element type for timeseries: "char", int2, int4, date, int8, time, timestamp, money, float4, float8, bpchar.
All timeseries elements should have the same size, so only fixed size character types are supported: for example char(10), but not varchar. But it is possible map varying size strings into integer identifiers using IMCS dictionary.
It will greatly reduce space used by columnar store and reduce queries execution time (manipulations with integers are more efficient than with strings).
Certainly this approach works only if cardinality of such column is not so large: dictionary should fit in memory.
Size of dictionary can be specified using "imcs.dictionary.size" parameter. Default value is 64kb.
If size of dictionary is less or equal than 64kb, then IMCS uses two bytes integer to store string identifier. If it is larger than 64kb, then
four bytes identifier is used. Please notice that the same dictionary is used for all table and columns. So dictionary size should be greater or equal than total size of cardinalities of all unlimited varchar columns loaded in columnar store.
IMCS is able to automatically converts strings to identifiers and visa verse in output/input functions. But you can also explicitly translate identifier to string using cs_code2str function.

Also IMCS is not able to represent NULL values. It is not enforced that fields of the source table were declared as NOT NULL, but attempt to insert NULL value in timeseries will cause error (or optionally NULL can be substituted with zero). Please use default values instead of NULLs.

Given all this information IMCS generates corresponding types and functions for loading/appending/accessing this timeseries.
Assume that we have table Quote. After calling cs_create('Quote', 'Day', 'Symbol')
we will get Quote_load() function for loading data from table in memory,
Quote_get(symbol char(10), low date, high date) function for fetching/slicing corresponding timeseries and triggers which will keep track updates in Quote table and propagate this changes to timeseries.

There are two ways of synchronizing original table and timeseries:

Automatic: using triggers. In this case all inserts/deletes in original table are immediately reflected in timeseries.

Manual: using explicit invocation of load/append/delete methods.

Execution of load() is significantly more efficient than propagation of updates using triggers. Mostly because of slowness of PL/pgSQL.
Also please notice, that been stored in shared memory, timeseries have to be reloaded after restart of
the server. Unfortunately PostgreSQL doesn't support database level triggers (like after startup on database in Oracle).
IMCS provides two alternatives: use autoload mode or manually load data. In case of using autoload mode, data will be automatically loaded from table to columnar store on demand when it is first accessed by any query. Please notice that for large tables loading data can take substantial amount of time and so increase execution time of the query initiated this load (it can confuse an user which expects this query to complete very fast).
Fortunately database servers are not restarted frequently...

When data is loaded from the table, records are sorted by timestamp and inserted in ascending order.
You can append data to existed timeseries, but timestamps of inserted elements should be greater than already loaded.
When timeseries is populated using insert trigger it is necessary to enforce that the data is inserted in the table in timestamp ascending order. Otherwise out-of-order error will be reported while inserting element in timeseries.

TABLE_get functions returns row of type TABLE_timeseries (this type is also generated by IMCS) which has the same columns as original table, but type of this columns is timeseries. So it is possible to refer to this timeseries as to any other columns and apply timeseries functions to them. For example query:

Result of the query above is scalar value (because of used grand aggregate). But most of timeseries functions take timeseries as input and return also timeseries. For example result of the query below is timeseries:

select cs_filter(Open
When you print result of execution of this query at the screen (for example by running this query in psql), it will be represented as large string literal in braces: 'date:{01/01/2010, 01/02/2010,...}'
Certainly it is not convenient for really large timeseries and may even cause memory exhaustion.
Alternatively it is possible to change vertical representation back to horizontal representation using TABLE_project or cs_project functions. Then produced tuples can be accessed in normal way using all SQL stuff.
For example it is possible to sort them or perform more grouping/filtering.

This function is used to generate all API functions, types and triggers for the specified table or view table_name.
These can be latter removed using table_name_drop function. timestamp_id is name of timestamp field by which timeseries elements are sorted in ascending order, allowing to efficiently extract time slices. timeseries_id is optional field identifying timeseries. For example for quotes it can be a symbol name. If this field is specified, then separate timeseries will be maintained for each symbol. If autoupdate parameter is true, then IMCS will create triggers which automatically update timeseries when new data is added/deleted to/from the source table. Alternatively it is possible to explicitly load/append/delete data to timeseries. Please notice that explicit bulk update/delete is significantly more efficient than row-level updated performed by trigger. If columnar store interface for a table was generated with autoupdate=false, then triggers are still generated but are disabled. You can enable them later using alter table TABLE enable trigger user command. As far as views cannot have row-level BEFORE or AFTER triggers in PostgreSQL, IMCS doesn't generate them if table_name is a view.

function cs_delete_all() returns bigint

Deletes all timeseries in columnar store. This function can be used for most efficient cleanup of columnar store.
Please notice that PostgreSQL doesn't allow to free shared memory, so it still be in use. But it can be reused in subsequent
allocation requests of columnar store. This function returns total number of removed elements (in all timeseries)

Populates timeseries with data from PostgreSQL table. If already_sorted parameter is true, then it is assumed that
records in the table are stored in proper (timestamp ascending) order. Otherwise IMCS will add "order by" clause
to select statement. Please notice that PostgreSQL vacuuming can change original order of the records. So disable vacuuming for the
table if you want to preserve insert order. Optional filter parameter allows to specify additional selection criteria for table records. It allows to include in timeseries only some subset of the table.
Particularly it can be used to append existed timeseries with new data.
This function returns number of inserted timeseries elements. If filter is not specified then this function loads data from the table only if timeseries are not yet initialized. If filter is not null, then this functions always tries to load data, assuming that programmer has specified proper filter condition allowing to avoid duplicates and preserve proper timeseries order. If filter is null and timeseries are already initialized, then this function does nothing and immediately returns zero.

function TABLE_is_loaded() returns bool

Checks if data was already loaded to columnar store. If you just need to ensure that data is loaded,
there is no need to call this function: you can always call TABLE_load, it will perform this check
itself and do nothing if data was already loaded. But if behavior of your application depends on state of
columnar store, then this function may be useful.

function TABLE_append(start_from TIMESTAMP_TYPE) returns bigint

Appends to timeseries records from the source table starting from start_from timestamp (inclusive).
Use this function if on-update trigger is disabled (autoupdate=false in parameters of cs_create).
Please also notice that this function is implemented in PL/pgSQL and so it is significantly slower than TABLE_load with the same filter condition.
This function returns number of added timeseries elements.

function TABLE_truncate() returns void

Truncates all timeseries for this table. This is most efficient way to delete vertical representation for the specific table.
If you need to delete all data in columnar store, better use cs_delete_all() function.

Makes horizontal projection of timeseries. Optional positions parameter specifies positions of selected timeseries elements.
If parameter positions is omitted, then all timeseries elements are transformed to horizontal representation.
So this function is opposite to TABLE_get(): get transforms horizontal representation to vertical and project does backward transformation. It is possible to use this function only if number of columns returned by TABLE_get()
and element types of corresponding timeseries are not changed. For example it is possible to run query like this:

Returns vertical representation of the whole table or its time slice. Returned record contains the same columns as record of the original table, but they have timeseries type instead of original scalar types. These columns can be used in timeseries functions (cs_*). If high or low parameters are not null, then them specify correspondingly upper/lower inclusive boundary for timestamp value. If some or both parameters are omitted, then corresponding boundary is open. It is possible to limit number of selected elements by specifying limit_ts parameter (if low boundary is open then last limit_ts elements will be selected, otherwise first limit_ts elements will be selected).

Returns vertical representation of the whole table or its horizontal slice. Returned record contains the same columns as record of the original table, but they have timeseries type instead of original scalar types. These columns can be used in timeseries functions (cs_*).
Parameter from_pos specifies start position in timeseries (inclusive) and parameter till_pos specifies end position (inclusive). If till_pos parameter is missed, then subsequence spans till end of timeseries. Values of both from_pos and till_pos parameters can be negative. In this case position is calculated from end of timeseries, i.e. TABLE_span(from_pos:=-1) extracts last element of the timeseries.

Deletes timeseries elements belonging to the specified interval. If high or low parameters are not null, then them specify correspondingly upper/lower inclusive boundary for timestamp value. If some or both parameters are nulls, then corresponding boundary is open. This function returns number of deleted elements.

Deletes timeseries elements from the beginning till specified timestamp till (inclusive) or delete all elements if this parameter is null/omitted.
This function is equivalent to TABLE_delete(null, till). IMCS provides separate function for it because it is intended to be the
most frequent case of deleting elements from timeseries: it corresponds to shifting data window when new elements are appended and
deteriorated are thrown away. This function returns number of deleted elements.

Joins timestamp with other unsorted timeseries. It returns positions of elements in this timeseries which timestamp matches correspondent element of joined timeseries. Semantic of matching depends on value of the direction parameter:

If direction is less than zero, then this timestamp should be less or equal than other timestamp (locate timeseries element preceding timestamp).

If direction is zero, then this timestamp should be equal to other timestamp (exact match of timestamps).

If direction is greater than zero, then this timestamp should be greater or equal than other timestamp (locate timeseries element succeeding timestamp).

Returns timeseries with specified identifier for the corresponding table or its time slice. Returned record contains the same columns as record of original table, but they have timeseries type instead of original scalar types. These columns can be used in timeseries functions (cs_*). If high or low parameters are not null, then them specify correspondingly upper/lower inclusive boundary for timestamp value. If some or both parameters are omitted, then corresponding boundary is open. It is possible to limit number of selected elements by specifying limit_ts parameter (if low boundary is open then last limit_ts elements will be selected, otherwise first limit_ts elements will be selected).

Does the same as function described above but for array of timeseries identifiers. For each timeseries identifier this function returns TABLE_timeseries record, so output will contain as much rows as there are identifiers.

Returns timeseries with specified identifier for the corresponding table or its horizontal slice. Returned record contains the same columns as record of the original table, but they have timeseries type instead of original scalar types. These columns can be used in timeseries functions (cs_*).
Parameter from_pos specifies start position in timeseries (inclusive) and parameter till_pos specifies end position (inclusive). If till_pos parameter is missed, then subsequence spans till end of timeseries. Values of both from_pos and till_pos parameters can be negative. In this case position is calculated from end of timeseries, i.e. TABLE_span(id,from_pos:=-1) extracts last element of the timeseries.

Does the same as function described above but for array of timeseries identifiers. For each timeseries identifier this function returns TABLE_timeseries record, so output will contain as much rows as there are identifiers.

Concatenates slices of timeseries for the specified identifiers. Returned record contains the same columns as record of original table, but they have timeseries type instead of original scalar types. Each such timeseries is concatenation of slices of timeseries for all specified identifiers. These columns can be used in timeseries functions (cs_*). If high or low parameters are not null, then them specify correspondingly upper/lower inclusive boundary for timestamp value. If some or both parameters are omitted, then corresponding boundary is open.

Deletes timeseries elements belonging to the specified interval. If high or low parameters are not null, then them specify correspondingly upper/lower inclusive boundary for timestamp value. If some or both parameters are nulls, then corresponding boundary is open. This function returns number of deleted elements.

Deletes timeseries elements from the beginning till specified timestamp till (inclusive) or delete all elements if this parameter is null/omitted.
This function is equivalent to TABLE_delete(id, null, till). IMCS provides separate function for it because it is intended to be the
most frequent case of deleting elements from timeseries: it corresponds to shifting data window when new elements are appended and
deteriorated are thrown away. This function returns number of deleted elements.

Joins timestamp with other unsorted timeseries. It returns positions of elements in this timeseries which timestamp matches correspondent element of joined timeseries. Semantic of matching depends on value of the direction parameter:

If direction is less than zero, then this timestamp should be less or equal than other timestamp (locate timeseries element preceding timestamp).

If direction is zero, then this timestamp should be equal to other timestamp (exact match of timestamps).

If direction is greater than zero, then this timestamp should be greater or equal than other timestamp (locate timeseries element succeeding timestamp).

Creates timeseries from string, for example '{1,2,3,4,5}'. Type of timeseries is specified by elem_type parameter. For timeseries of characters it is also necessary to specify size of timeseries element - elem_size.
Please notice that PostgreSQL allows implicit cast from string to the target type using this type input function, but in this case information about timeseries element type and size should be encoded in the string: 'int4:{1,2,3,4,5}'.

Binary operations with timeseries. These functions take two timeseries arguments and return result timeseries.
IMCS tries to automatically adjust types of input arguments (for example if one timeseries has "int8" element type and another - "float8", then first one will be converted to "float8").

Function

Description

function cs_add(timeseries,timeseries) returns timeseries

Adds elements of two timeseries

function cs_sub(timeseries,timeseries) returns timeseries

Subtracts elements of two timeseries

function cs_mul(timeseries,timeseries) returns timeseries

Multiplies elements of two timeseries

function cs_div(timeseries,timeseries) returns timeseries

Divides elements of two timeseries

function cs_pow(timeseries,timeseries) returns timeseries

Raises element of first timeseries to power specified by element of second timeseries

function cs_and(timeseries,timeseries) returns timeseries

Bitwise AND of elements of two integer or boolean timeseries

function cs_or(timeseries,timeseries) returns timeseries

Bitwise OR of elements of two integer or boolean timeseries

function cs_xor(timeseries,timeseries) returns timeseries

Bitwise XOR of elements of two integer or boolean timeseries

function cs_eq(timeseries,timeseries) returns timeseries

Checks if element of first timeseries is equal to element of second timeseries

function cs_ne(timeseries,timeseries) returns timeseries

Checks if element of first timeseries is not equal to element of second timeseries

function cs_gt(timeseries,timeseries) returns timeseries

Checks if element of first timeseries is greater than element of second timeseries

function cs_ge(timeseries,timeseries) returns timeseries

Checks if element of first timeseries is greater or equal than element of second timeseries

function cs_lt(timeseries,timeseries) returns timeseries

Checks if element of first timeseries is less than element of second timeseries

function cs_le(timeseries,timeseries) returns timeseries

Checks if element of first timeseries is less or equal than element of second timeseries

function cs_maxof(timeseries,timeseries) returns timeseries

Maximum of two elements

function cs_minof(timeseries,timeseries) returns timeseries

Minimum of two elements

function cs_like(timeseries,pattern text) returns timeseries

Finds elements of character timeseries matching specified pattern (case sensitive). Rules of matching are the same as for PostgreSQL LIKE predicate.

function cs_ilike(timeseries,pattern text) returns timeseries

Finds elements of character timeseries matching specified pattern (ignore case). Rules of matching are the same as for PostgreSQL ILIKE predicate.

function cs_cat(timeseries,timeseries) returns timeseries

Concatenates elements of two timeseries. Input timeseries can have any element type, result is always timeseries of characters which element size
is equal to sum of element sizes of concatenated timeseries. For example cs_cat('bpchar1:{a,b,c}', 'bpchar1:{x,y,z}') = 'bpchar2:{ax,by,cz}'.
In case of concatenation of character strings which actual length is smaller than fixed element size, result will contains filler character ('\0').
So if element size of concatenated timeseries in the above example is 3, then result will be E'{a\\000\\000x\\000\\000,b\\000\\000y\\000\000,c\\000\\000z\\000\000}'. If you prefer to get '{ax,by,cz}', then please use cs_add instead of cs_cat.
Function cs_cat is intended to be used for concatenation of group-by keys (character or numeric) for aggregation.

Concatenates two timeseries. Result of this function is timeseries containing elements both of head and tail timeseries.
For example cs_concat('int4:{1,2,3}','int4:{4,5,6}') = '{int4:1,2,3,4,5,6}'.
Parameters head or tail may be null. In this case cs_concat returns just not-null timeseries.

Functions of this group take two timeseries arguments and calculate single scalar value as result.
IMCS tries to automatically adjust types of input arguments (for example if one timeseries has "int8" element type and another - "float8", then first one will be converted to "float8").

Converts timeseries to array. TYPE should be one of "char", int2, int4, date, int8, time, timestamp, money, float4, float8, bpchar and should match element type of the converted timeseries. Please notice that array is constructed in memory and large timeseries can cause memory overflow.

Converts array to timeseries. This function creates timeseries iterator for the input array, allowing to apply to it any timeseries functions. Type of the result timeseries element is the same as type of the array element.
Optional elem_size parameter is needed only for text array, it should specify maximal size of array element.

Extracts subsequence from timeseries. Parameter from_pos specifies start position of subsequence (inclusive) and parameter till_pos specifies end position (inclusive). If till_pos parameter is missed, then subsequence spans till end of timeseries. Values of both from_pos and till_pos parameters can be negative. In this case position is calculated from end of timeseries, i.e. cs_limit(s, from_pos:=-1) extracts last element of the timeseries.

function cs_head(timeseries, n bigint default 1) returns timeseries

Extracts n first elements of timeseries. This function is equivalent to cs_limit(0, n-1).

function cs_tail(timeseries, n bigint default 1) returns timeseries

Extracts n last elements of timeseries. This function is equivalent to cs_limit(-n).

Chooses one of two alternatives: if element of cond boolean timeseries is true, then use element of then_ts timeseries, otherwise use element of else_ts timeseries. All timeseries are traversed with the same speed: if we take element from then_ts timeseries, then corresponding element of else_ts timeseries is skipped.
For example cs_iif('char:{1,0,1}', 'float4:{1.0,2.0,3.0}', 'float4:{0.1,0.2,0.3}') = 'float4:{1.0,0.2,3.0}'

Conditional computation: if element of cond boolean timeseries is true, then take next element of then_ts timeseries, otherwise use element of else_ts timeseries. Unlike cs_iff then/else timeseries are accessed only on demand, so number of elements fetched from this timeseries depends on condition.
For example cs_if('char:{1,0,1}', 'float4:{1.0,2.0,3.0}', 'float4:{0.1,0.2,0.3}') = 'float4:{1.0,0.1,2.0}'

Transforms vertical representation (all timeseries elements or just elements on specified positions) to horizontal representation. This is more generic version of TABLE_project which can be applied to arbitrary set of columns.
But as far as result row is anonymous, it is not possible to unnest it using PostgreSQL ().* clause.
In PostgreSQL 9.3 it is possible to use cs_project in FROM list (lateral join) providing alias with
description of returned columns.
Concerning optional disable_caching parameter please read section Projection issues.

This is specialized version of cs_project for transposing result of hash_agg_* functions.
They return two timeseries: the first one with values of aggregate and the second one with values of group-by key.
cs_project_agg transforms this result to set of cs_agg_result rows, consisting of two columns: (agg_val float8, group_by bytea). In case of combining several keys for group-by key, it can be splitted back into separate values using cs_cut or cs_as functions.
Concerning optional disable_caching parameter please read section Projection issues.

Checks if timeseries contains no elements. This function is usually more efficient than cs_count() %lt;%gt; 0 except cases when filter is applied to large timeseries and relatively small number of elements fits filter condition (unlike cs_empty, cs_count can be executed in parallel)

Functions calculating aggregates for each group.
Groups are identified by sequence of elements with the same value in group-by timeseries.
It is not mandatory to sort this timeseries. But you should realize that sequences of the same value in different parts of the timeseries will form different groups. For example, there are four groups in timeseries '{1, 1, 2, 1, 1, 1, 2, 2,}': ('{1, 1}', '{2}', '{1, 1, 1}', '{2, 2}'). If you want to perform aggregation for all timeseries element with the same value, then use cs_hash_* functions instead.

Function

Description

function cs_group_count(timeseries) returns timeseries

Returns number of elements in each group (sequences of repeated values)

Aggregation is done for window - N subsequent elements of timeseries where N is window size. At each step window is moved at one position forward. So result timeseries has the same number of elements as input timeseries. First N-1 elements of result are calculated for windows smaller than N. You can use cs_limit(cs_window_AGG(input, N), N-1) to skip these elements.

Average True Range (ATR) indicator with window_size period.
Formula: ATR[i] = (ATR[i-1]*(n-1) + TR[i])/n, where n=min(i+1, window_size).
First window_size-1 elements of result can be skipped to get correct ATR sequence.

Aggregation with group-by. These function perform grouping and aggregation similar to SQL. All elements of group_by sequence with the same value forms single group. It is done using hash function, so cs_hash_* aggregates require additional memory for building hash table. These functions contain two out parameters: return two timeseries.
The first one contains calculated aggregates. The second one contains corresponding group keys.

If it is necessary to perform grouping by more than one key, it is possible to use cs_cat (or || SQL operator) to concatenate several columns. Later it is possible to use cs_cut or cs_as functions to split concatenated value back into components.

Counts number of elements having the same value. This function has two output parameters: count timeseries contains counters and groups timeseries contains corresponding group key values. So result of cs_hash_count('float4:{1,3,1,4,2,2}') will be ('int8:{2,2,1,1}', 'float4:{1,2,3,4}').

Counts number of duplicates for each group. Groups are identified by group_by timeseries.
This function has two output parameters: count timeseries contains counters and groups timeseries contains corresponding group key values. Parameter min_occurrences specifies minimal number of occurrences of element in each group. It should be positive number. With default value 1 of min_occurrences this function calculates number of distinct values.

Calculates average value for each group. Groups are identified by group_by timeseries.
This function has two output parameters: avg timeseries contains calculated averages and groups timeseries contains corresponding group key values.

Calculates bitwise OR of elements for each group. Groups are identified by group_by timeseries.
This function has two output parameters: sum timeseries contains calculated bitmasks and groups timeseries contains corresponding group key values.

Calculates bitwise AND of elements for each group. Groups are identified by group_by timeseries.
This function has two output parameters: sum timeseries contains calculated bitmasks and groups timeseries contains corresponding group key values.

Calculates q-quantiles for timeseries of scalar element type.
The quantiles are the data values marking the boundaries between consecutive subsets.
This functions returns timeseries with q_num+1 values of the same type as input timeseries
For example cs_quantile('float4:{10,3,0,3,4,5,9,11,7,3,3}', 2)='float4:{0,4,11}'

Builds histogram for the input timeseries. Minimal (inclusive) and maximal (exclusive) values for input timeseries should be specified as well as number of interval (histogram columns). Values outside specified range [min_value,max_value) are ignored. Number of intervals should not be greater than tile size. Execution of these functions can be parallelized.

Finds positions in input timeseries where it crosses zero, if first_cross_direction is positive then starts with first cross over, if negative then starts with cross below, if zero it doesn't matter (first cross can be over or below)
For example cs_cross('float4:{1,2,-1,0.5,0.6,0.0,0.1,0.3,-5}', 0)='int8:{2,3,7}'

Finds positions of extrema (local minimum and maximums) in input timeseries, if first_extremum is positive then starts with first local maximum, if negative starts with local minimum, if zero it doesn't matter.
For example cs_extrema('float4:{1,2,3,2,1,0,0,1,1,2,4,0}', 0)='int8:{2,6,10}'

Stretches vals timeseries to the length of first timeseries. Repeats elements of vals timeseries while corresponding timestamp (timeseries ts2) is larger than timestamp from ts1. For example cs_stretch('int4:{1,2,3,4,5}', 'int4:{2,4}', 'float4:{1.1,2.2}', 1.0) = 'float4:{1.1,2.2,2.2,1.0,1.0}'. This function can be used to calculate split adjusted price. We need to revert timeseries of splits, calculate cumulative product, stretch and multiply it on price.

Injects missed elements in vals timeseries (associated with ts2) so that corresponding timestamps of ts1 and ts2 are matched. For example cs_stretch0('int4:{1,2,3,5}', 'int4:{2,3,4}', 'float4:{1.1,1.2,1.3}', 0.0) = 'float4:{0.0,1.1,1.2,1.3,0.0}'. This function may be useful if we need to perform operations with trading data for different symbols
and this data can contains some holes (no trading for particular symbol for this date)

Gets values from third timeseries corresponding to the timestamp from ts2 closest to the timestamp from ts1. For example cs_asof_join('int4:{4,9}', 'int4:{1,3,6,10}', 'float4:{0.1,0.3,0.6,1.0}') = 'float4:{0.3,1.0}'.

Splits binary string into components. This function is reverse to cs_cat which may be needed to construct
combined group-by key for aggregate functions. format string describes types of component. Type is specified by one letter followed by field length. Below is list of supported types:

Format specification

PostgreSQL type

i1

"char"

i2

int2

i4

int4

i8

int8

f4

float4

f8

float8

d4

date

t8

time

T8

timestamp

m8

money

cN

char(N)

For example format string 'i4f4c10' corresponds to a row with one integer, one float and one character component with length 10.

function cs_as(str bytea, type_name cstring) returns record

Yet another function splitting binary string into components. This function is reverse to cs_cat which may be needed to construct combined group-by key for aggregate functions. Parameter type_name specifies composite type which components will be fetched from the input string. Below is example of using this function:

And as far as PostgreSQL has no information about columns, you can not use (cs_as(...)).* clause to extract columns of the row.
But you can create function returning proper type and bind it to cs_as C implementation:

In first two cases constant timeseries (timeseries containing the same value) is implicitly created for right operand using cs_const function.
In third case timeseries is created from string literal using cs_parse function.
And in the last case conversion to timeseries type is implicitly made by PostgreSQL using input function of
this type.

There are several functions in IMCS API returning a row or a set of rows: cs_hash_*, cs_project*, cs_as.
PostgreSQL provides two ways of decomposition of compound type into columns:

select (foo()).*

select * from foo()

Unfortunately case 1) is implemented very inefficiently: a function is called as many times as there are columns in a returned row
(see discussion of this question at StackOverlow).
For example cs_hash_sum function has two output parameters: sum and groups.
Output parameters are actually returned in PostgreSQL as anonymous row. So if we write:

then PostgreSQL will call function cs_hash_sum twice. It means that aggregation will be performed twice:
we will have to do double amount of work. I failed to find a way to make PostgreSQL to avoid these redundant calls.
But this problem is solvable.

First of all it is possible to avoid (...).* construction and access composite type attributes explicitly:

In this case PostgreSQL generally will not perform redundant calls. To guarantee that multiple evaluation won't be performed you can use the OFFSET 0 hack or abuse PostgreSQL's failure to optimise across CTE boundaries:

But IMCS also tries to provide workaround for (cs_project(...)).* construction: cs_project and cs_project_agg functions can cache their results, avoiding redundant calculations. Unfortunately there are some
restrictions. For example it is not possible to use cs_project more than once in one query. You can disable such caching for the particular invocation by setting
disable_caching optional parameter to false. Or completely disable caching by setting imcs.project_caching configuration parameter to true.

And concerning case 2) calling function in FROM list: it is possible if function doesn't depend on other data sources at the same query layer.
For example in IMCS cs_hash_sum accepts timeseries arguments which are provided by Quote_get.
So we can write:

But it works only in PostgreSQL 9.3 which supports lateral joins.
A lateral join enables a subquery in the FROM part of a SELECT to reference columns from preceding items in the FROM list.
Also function calls in PostgreSQL 9.3 can now directly reference columns from preceding FROM items, even without the LATERAL keyword.
This is why query above correctly works with PostgreSQL 9.3 and higher (function is called only once) and generates
function expression in FROM cannot refer to other relations of same query level error in previous PostgreSQL versions.

Also using projection function in FROM list allows to specify alias and describe columns:

Timeseries are stored in shared memory as B-Tree pages. This B-Tree provides fast access to timeseries element by position
(for all types) or by value (only for timestamp). There is separate B-Tree for each timeseries. PostgreSQL hash is used to locate
timeseries by identifier. Hash key includes name of the source table, name of the corresponding field and optionally identifier of timeseries. For example for Quote table identifier of timeseries may be 'quote-close-IBM'.
Size of B-Tree pages is determined by "imcs.page_size" configuration parameter. Default value is 4kb.

IMCS uses RW (read-write) lock to synchronize access to columnar store. It means that multiple read-only queries can be performed concurrently,
but adding or removing timeseries elements is possible only in exclusive mode. Lock is set when timeseries is accessed first time. If imcs.serializable configuration parameter is true (default), then lock is hold till the end of transaction. Such locking policy provides serializable isolation level for timeseries.
If imcs.serializable is false, then lock is released at the end of query execution. It corresponds to "read committed" isolation level.

Right now IMCS supports RLE compression for timeseries of character type. But duplicates are eliminated only at B-Tree pages.
When elements are extracted into tile, them are decompressed. Using RLE at tiles level can significantly increase speed of some operations.
For example if we perform aggregation (let's say sum) of timeseries with large number of repeated duplicate values, then RLE
can significantly reduce number of performed operation. If value is repeated 100 times, then with RLE we can just
calculate 100*x instead of performing 100 additions.
But IMCS is first of all oriented on financial data (trading systems). And here duplicates are not so often, at least for numeric characteristics.
(price, volume, date,...).
Even if value of some stock option is quite durable (variation about few cents per year), small fluctuations of this option during a days normally occur. Our experiments show that RLE encoding cause only degrade of performance in case
of standard queries on securities data.

Most of cs_* functions are not actually performing any executions. Instead of it, they are constructing pipe of operators (or expression tree). A unit of exchanging data between operators is a tile (vector). So operators are performing vector operations to reduce interpretation cost. Size of a tile should be large enough to minimize overhead of organizing work of the pipe.
But it should fit in L1 CPU cache to keep processing speed high. Default size of the tile is 128.

IMCS is able to execute some operations in parallel. Now it is done for grand and hash aggregates, top-N functions (all operators where size of output is smaller than size of input). IMCS maintains pool of threads. Number of threads in the pool can be specified using "imcs.n_threads" configuration parameter. By default (zero value of this parameters), number of threads is detected automatically based on number of CPUs (cores) in the system.
IMCS clones expression subtree and splits into segments timeseries accessed in the leave nodes of this tree (timeseries stored in columnar store). Number of segments corresponds to number of threads. So each thread is processing its own part of timeseries.
Then results are merged using operator-specific merge function. Merge requires synchronization, so only one thread can perform merge at each moment of time.

Please notice that PostgreSQL is not able to parallelize execution of SQL query. Certainly it is possible to manually split query into several subqueries and execute them concurrently. But it is not trivial and not convenient. The fact that IMCS can overcome this limitation is very important for OLAP queries.

IMCS was originally designed to hold all data in main memory. In this case it shows the best performance.
But there are cases when available data doesn't fit in server's RAM.
After receiving requests from several customers I have added to IMCS possibility to swap data to the disk.
Certainly performance of disk version of IMCS is not so high as of in-memory version.
But there are two factors which allows to expect quite good performance of disk version also:

IMCS uses B-Tree to store timeseries data and B-Tree is one of the most efficient data structure for disk lookups (minimizing number of read operations)

Most of IMCS queries are performing sequential scans of large timeseries intervals. It allows to sequentially read data from the disk with disk head speed (up to Gbit per second for modern disks).

To use IMCS in disk mode, you should rebuild it with USE_DISK=1 make option.
In this case IMCS will store timeseries data in specified file or raw partition and use page pool (disk cache) to optimize access to the disk.
You need to specify path to file or raw partition and size of disk cache (number of pages).
Cache is placed in shared memory so it can be accessed by all PostgreSQL processes.
Please notice that size of the cache should be smaller than size of shared memory reserved for IMCS extension ("imcs.shmem_size").

Usually the larger cache is used, the better performance you will get. Certainly if cache fits in main memory, in case of swapping large cache can only cause degrade of performance.
But most of IMCS queries perform sequential scan of data. If size of data is larger than size of the cache, then it doesn't matter how large this cache is: there will be no cache hits in any case (page is thrown away from the cache by LRU algorithm before it is accessed second time).
IMCS uses two level LRU replacement algorithm trying to keep in memory internal pages of B-Tree and protect them from throwing away from the cache by leaf pages during
large scans.

Also please notice that caching is also done at OS level (file system cache). It means that the same page can be stored in memory twice: in IMCS shared memory and in
OS disk cache. And extra memory copies are needed to move data between OS cache and IMCS cache. IMCS cache provides faster access (requires no context switches),
but only OS has precise knowledge about availability of memory and so it is more flexible in assignment of available memory resources.
And IMCS knows specific of accessed pages (leaf or internal B-Tree page) and so may choose more efficient replacement policy for each of them.
So there is no simple answer to the question how to split memory between OS and internal IMCS cache. You can not certainly control size of OS file system cache, but the larger IMCS cache is, the less memory left to OS and can be used for caching at OS level.

Disk version of IMCS doesn't provides durability (persistence) of data: after restart of server it is still necessary to reload all IMCS data.
There are two main reasons for it:

Some IMCS data is still not persistent, for example hash table used to locate timeseries.

Due to performance reasons, IMCS is not using WAL (write ahead logging) or some other approach to provide
all ACID properties of transaction. So in case of some error (power failure, OS crash, postgresql crash,...) IMCS data file can be corrupted
and there is no way to atomically recover it.

IMCS never shrinks size of used data file. If you deallocate some table, then correspondent pages will be marked as free and can be reused in subsequent allocation queries.
But size of the file is not decreased. Even after restart of the server file is not truncated, because:

IMCS can work not only with normal OS file but also with raw partitions which can not be truncated;

extension of file requires update of the file metadata which adds additional overhead.

As far as IMCS is using PostgreSQL shared memory, it should be loaded via shared_preload_libraries.
Please add '$libdir/imcs' to shared_preload_libraries in postgresql.conf file:

shared_preload_libraries = '$libdir/imcs' # (change requires restart)

Size of shared memory used by IMCS can be specified using imcs.shmem_size parameter.
At most systems maximal size of System V shared memory is limited by quite small constant. So you may also need to alter system
configuration (please refer to OS manual about how to do it). PostgreSQL 9.3 uses mmap instead of System V shared memory,
so there should be no problem with system quotas. But there is yet another limitation in Linux: it is not able to create shared memory segment larger
than 256Gb with standard 4Kb pages. And now servers with 1Tb memory is not something very exotic. To utilize all available memory in this case it is possible to create multiple shared memory segments. But PostgreSQL is not able to do it. Another solution is to increase page size. Linux supports
huge pages. Unfortunately PostgreSQL still doesn't provide any way of using huge pages: you need to patch PostgreSQL source: add MAP_HUGETLB to PG_MMAP_FLAGS define in sysv_shmem.c:

IMCS distributive contains smarter patch sysv_shmem.patch for PostgreSQL 9.3.1 which sets MAP_HUGETLB flag only if size of shared memory segment is larger than 256Gb and only if MAP_HUGETLB is defined (since Linux 2.6.32).

Below is list of all IMCS configuration parameters:

Parameter name

Description

Default value

Recommendations

imcs.shmem_size

Size of shared memory (Mb) used by columnar store.

8*1024 (8Gb)

Make it large enough to fit all data requiring vertical representation. It can not be increased without restart of the server.

imcs.n_timeseries

Estimation for number of timeseries

10000

This value is needed for PostgreSQL hash implementation. Too small value may cause large number of collisions.

imcs.n_threads

Number of threads in thread pool for concurrent execution of a query

0 - autodetect number of CPUs

Usually number of threads should be equal to number of physical execution units in the system. Please notice that in case of using hyperthreading number of reported CPUs is twice large than real number of cores. Set this parameter to 1 to disable concurrent execution

imcs.page_size

Size of B-Tree page size in bytes

4096

As far as B-Tree is stored in memory, it is not so critical to use large pages. But small page may increase per-element storage overhead.

imcs.tile_size

Size of tile or vector that is used to organize vector operations

128

The larger tile is, the less influence of interpretation overhead. But best performance can be achieved only if tile fits in CPU L1 cache. Please notice that some operators have two or more parameters, so more than one tile can be calculated at each stage of operator's pipe processing. Also memory may be needed for other purposes, so to reduce probability of cache misses, keep this value reasonably small.

imcs.dictionary_size

Size of dictionary used by IMCS to map unlimited size strings to integer identifiers

64kb

If size of dictionary is set to zero, then it is not possible to load in columnar store columns with unlimited size types (i.e. VARCHAR).
It size of the dictionary is less or equal than 64kb, then IMCS maps strings to 16-bit integer identifiers.
If size of the dictionary is greater than 64kb, then IMCS maps strings to 32-bit integer identifiers.
The size of the dictionary should be larger than cardinality of all varying size columns of all tables which are loaded in columnar store.
And it should fit in memory reserved for IMCS using imcs.shmem_size parameter.

imcs.substitute_nulls

Substitutes NULLs with 0 while loading data in columnar store

false

By default attempt to insert NULL value will cause an error. When value of this option is set to true, IMCS doesn't report an error and stores zero instead of NULL.

imcs.autoload

Automatically loads data in columnar store when it is accessed first time by any query

true

Loading data from large table can take substantial amount of time and so increase execution time of the query initiated this load. It can confuse an user which expects this query to complete very fast. In such case explicit load of data after server restart can be more desirable (it can be completed before receiving any user's query).

imcs.serializable

Hold lock till the end of transaction

true

Such locking policy provides serializable isolation level for columnar store. If this parameter is set to false, then lock is released at the end of query execution. It corresponds to "read committed" isolation level.

Trying to print result of query returning larger timeseries can cause memory overflow or at least produce a lot of screens of hardly readable text. Setting this limit allows to restrict size of printed timeseries: only part of timeseries elements will be printed and then "..." indicates that timeseries was truncated.
Setting this parameter to 0 disables this limitation.

Caching can cause incorrect behavior in some cases: when cs_project is used twice in the same query. In this case disable it: everything should work correctly, may be only with some performance penalty in case of using (cs_project(...)).* construction. Also it is possible to disable caching for each particular cs_project invocation by assigning false to optional disable_caching parameter. Please read more in section Projection issues.

imcs.use_rle

Use RLE encoding for character timeseries

false

RLE allows to significantly reduce size of used memory for timeseries with large fraction of duplicates.

imcs.cache_size(*)

Size of IMCS disk cache (in pages)

256*1024

Total size in bytes used by cache is imcs.cache_size*imcs.page_size. With default values of parameters it is 1Gb. It should be smaller than imcs.shmem_size. See more about choosing optimal setting for this parameter in section Scaling beyond physical memory.

imcs.flush_file(*)

Flush changes to the file during commit

true

Write dirty pages to the disk during commit.
Pages are written in offset increasing order, so disk writes are more or less sequential minimizing disk head movements. That is why it can be faster than random writes of dirty pages thrown away by LRU
replacement algorithm. But it can increase number of writes, especially in case of short transactions (for example if triggers are used to propagate updates to IMCS).

imcs.file_path(*)

Path to IMCS disk file or partition.

"imcs.dbs"

Location of IMCS file or raw partition. Please notice that IMCS never tries to truncate this file.

Loading data in columnar store takes at my computer just 15 seconds.
If we call cs_create prior to loading data in Quote table, then
time of importing data from CSV file will increase from 2.5 minutes to 6.5 minutes. It is
because of using trigger to propagate inserts to in-memory columnar store.

Now let's calculate volume-weighted price for IBM for the period from 2010 till 2013:

Query execution time is 10 milliseconds.
Now do the same thing with standard SQL:

select sum(Close*Volume)/sum(Volume)
from Quote where Symbol='IBM' and Day between date('01-Jan-2010') and date('01-Jan-2013');

It takes 750 milliseconds.

Now let calculate VWAP for all symbols.
To simplify it we will first create table containing information about all symbols.
Actually this data is usually available and contains much more information than just symbol name.
But here we need just symbol name:

select Symbol,sum(Close*Volume)/sum(Volume) as VWAP
from Quote group by Symbol;

Result is returned after 2243 milliseconds.

Now let's test filter queries with projection back to horizontal representation.
The following query finds all dates for 'ABB' symbol when close price was more than 1% large than open price for the particular quarter:

Load is completed twice faster than in case of Quote table: 7.5 seconds.
Now let's execute VWAP for this timeseries:

select Volume//Close as VWAP from Quote_get();

Query is completed at my system in 10 milliseconds.
The same query using SQL:

select sum(Close*Volume)/sum(Volume) as VWAP from Quote;

One second (one thousand milliseconds). So IMCS query is 100 times faster.

Now perform filter query for large timeseries:

select cs_count((Close>Open*1.1)?) from Quote_get();
select count(*) from Quote where Close>Open*1.1;

Ratio of the query execution times is once again 100: 6.274 msec vs. 768.251 msec.

Now consider real use case wth one timeseries and large enough records. There are about 10 million records with ~40 columns.
Database size is about 5Gb. Queries perform groupping by various combinations of fields and cacluate aggregates for some characterestics.
For example:

select trader,desk,office,sum(score*volenquired)/sum(volenquired)
from DbItem group by trader,desk,office;

takes ... 144 milliseconds. The ratio is more than 2 thousands times. But this is the result with default PostgreSQL parameters (only "shared_buffers" was increased to hold all database in memory). If we also increase "work_mem" from default 1Mb to 1Gb, then times of the query is reduced to 33 seconds for first execution and 7 seconds for subsequent executions.

So summarizing these results: IMCS provides about 5-10 times increase of performance for relatively small timeseries (thousands elements)
and 100 times faster speed for large timeseries (millions elements) on standard desktop with quad core processor. For SMP server with larger number of cores this ratio is expected to be even higher.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the Software), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHOR OF THIS SOFTWARE BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.