Main menu

Monthly Archives: August 2014

This demonstrate below show how to visually create a report directly against data stored in MongoDB (with no coding required). The following topics are shown:

Pentaho Data Integration tool is used to create a transformation that does the following:
Connect to and query MongoDB.
Query results are sorted.
Sorted results are grouped.
Pentaho Report Designer is used to visually create a report by using the data from a PDI transformation.

I will show one tricky way of creating a subquery to build a set of rows coming from a string field which includes a list of valuaes separated by comma

Given the example of a string field with the following content ‘A,B,C,D’.Using REGEXP_SUBSTR you can extract only one of the 4 matches (A,B,C,D): the regex [^,]+ matches any character sequence in the string which does not contain a comma.

If you run:

SELECT REGEXP_SUBSTR ('A,B,C,D','[^,]+') as set_of_rows
FROM DUAL

you’ll get A.

and if you’ll try running:

SELECT REGEXP_SUBSTR ('A,B,C,D','[^,]+',1,1) as set_of_rows
FROM DUAL

you’ll also get A only that now we also sent two additional parameters: start looking in position 1 (which is the default), and return the 1st occurrence.

Now lets run:

SELECT REGEXP_SUBSTR ('A,B,C,D','[^,]+',1,2) as set_of_rows
FROM DUAL

this time we’ll get B (2nd occurrence) and using 3 as the last parameter will return C and so on.

The use of recursive connected by along with level makes sure you’ll receive all the relevant results (not necessarily in the original order though!):

SELECT DISTINCT REGEXP_SUBSTR ('A,B,C,D','[^,]+',1,LEVEL) as set_of_rows
FROM DUAL
CONNECT BY REGEXP_SUBSTR ('A,B,C,D','[^,]+',1,LEVEL) IS NOT NULL
order by 1

will return:

set_of_rowsABCD

which not only contains all 4 results, but also breaks it into separate rows in the resultset and will be useful to add it on an IN() sql clause

This query “abuses” the connect by functionality to generate rows in a query on dual. As long as the expression passed to connect by is true, it will generate a new row and increase the value of the pseudo column LEVEL. Then LEVEL is passed to regex_substr to get the nth value when applying the regular expression