menu

August 12, 2014

Merge and Join Cursors in Proper Order in esProc (II) Aligned joining

When making the statistics, sometimes, you need to consolidate the data from multiple cursors, which is similar to joining the data from multiple tables.If the data in cursor are required to join a normal TSeq, then you can use cs.switch().

What if the data to join are all from the cursor? As we know, it is usually impossible to retrieve all data in the cursor. How can we join these data then? In esProc, you can use function join@x() to join the data from multiple cursors. For example:

In A5~A8, perform
the aggregate operations over the products in each category, and return the
respective cursor of temporary files. In A9, the
daily sales data for products in each category will be aligned and joined by
date. From A10, retrieve the statistical results of the first 25 days, as shown
below:

Once the
cursor is aligned joined, a cursor will be returned. From which the retrieved
result is similar to the TSeq joining, and all fields are
composed of the records. Thus, when retrieving, you must note that the
joined records take more memory than those in the normal cases. In addition, since data is composed of records while not
values, please note the requirements on writing the
expression, in particular the re-joining with the result cursor, when using the
result cursor for computation.

The result
of aligned joining of cursors can also be used to inherit such as filtering or
generating, for example:

From the joined
cursor, filter out the total amount of food orders which are greater than the
total amount of wine orders, and then generate the TSeq. In A12, return the first 100 rows of results:

In using
the aligned joining with cursor, you must remember that the data in the cursor
cannot be read into and maintain in the memory during retrieving the data.
Instead, they can only be traversed for once from the first to the last. Therefore, regarding the join operation, the data in each cursor must be sorted, which is different
from processing the multi-table join for database, and quite unlike the join() for normal
TSeq. As shown in the above example, the data in A5~A8
are ordered by date, which can ensure the computation is correct when joining.

In order to
explain this problem, we create a cursor using two in-memory TSeq. Let’s have a
look:

The TSeqs in A1and
A2 are shown below:

In
A5, you will see the alignedjoining result:

The data in
cursor is different to the normal TSeq. When looking for the New York state
corresponding to the New York city for joining, the cursor of State data has
already moved to the entry 32, and the previous
records are unfindable for later computations. So, for most cities, the
corresponding state is unfindable like this. Because the option @1 and @a are not used in
function join@x() to specify the left join or full join, only cities finding out corresponding
state are returned, and the data are quite few.

If having
the city information sorted first, then you can have the normal result of
joining: