Pentaho Data Service SQL Support and Other Development Considerations

Overview

The Pentaho Data Service is designed to support a subset of SQL clauses and literals that are particularly useful for data blending, optimizations, and other scenarios. The following lists what we support. Limitations are noted near the end of this article.

Supported SQL Literals

The Pentaho Data Service supports the following literals.

Strings have single quotes around them. The escape character for a single quote is another single quote, like this ''.

Dates have square brackets around them. The following formats are supported: [yyyy/MM/dd HH:mm:ss.SSS], [yyyy/MM/dd HH:mm:ss] and [yyyy/MM/dd].

For an IN list in a SQL statement, the date formats can have single quotes around them and dashes that replace slashes, like this: SELECT * FROM BUILDS WHERE BuildDate IN ('2015-03-18', '2015-03-22'). Note that you cannot surround a date format with a bracket date in an IN list.

Number and BigNumber should have no grouping symbols. Use a period to represent a decimal, like this: 123.45.

Integers should contain only digits.

Boolean values can be TRUE or FALSE.

Supported SQL Clauses

The Pentaho Data Service supports the following clauses.

Clause

What is Supported

SELECT

COUNT(field)

COUNT(*)

COUNT(DISTINCT field)

DISTINCT <fields>

IIF (condition, true-value or field, false-value or field)

CASE WHEN condition THEN true-value ELSE false-value END

SUM

AVG

MIN

MAX

Aliases with both the AS keyword and with one or more spaces separated. For example: SUM(sales) AS "Total Sales" or SUM(sales) TotalSales

Constant expressions are possible. See the Supported SQL Literals section for more details.

FROM

Only one Pentaho Service Name is permitted.

You can use aliases for the Pentaho Service Name.

You can omit the service name to query from an empty row or you can query from dual, for example SELECT 1 or SELECT 1 FROM dual are the same. (Dual is a special one row, one column table supported by some database vendors.)

WHERE

Nested Brackets

AND, OR, NOT if followed by brackets. For example: NOT (A=5 OR C=3).

Precedence is considered.

Literals (String and Integer)

PARAMETER('parameter-name')='value'. Note that this always evaluates to TRUE in the condition.

=

<

>

<=, =<

>=, =>

<>

LIKE is supported. The standard % and ? wildcards are converted to .* and . regular expressions.

REGEX matches the regular expression.

IS NULL

IS NOT NULL

IN The syntax for multiple values is: value, value, value, ...

You can put a condition on the IIF expression or it's alias if one is used. Use identical string literals for expressions.

GROUP BY

Group on fields are supported, not the IIF() function.

LIMIT

You can retrieve a specific amount of rows from a result set using the LIMIT keyword. For example, this query returns the first 10 rows (#1-10) from the result set:SELECT * FROM data_service LIMIT 10;

You can also specify an offset for returning as specific amount of rows using the LIMIT keyword or LIMIT/OFFSET keywords. This queries returns 10 rows from the result set, but the results are offset by five rows so you see results for rows #6 - 15: SELECT * FROM data_service LIMIT 5, 10;

You can also use the OFFSET keyword to retrieve the same results: SELECT * FROM data_service LIMIT 10 OFFSET 5;

HAVING

Conditions should be placed on the aggregate construct, not the alias.

Please use identical strings for the expressions including spaces. For example, if you use put spaces before and after the asterisk for COUNT( * ) in the SELECT clause, do the same for the HAVING clause.

Put HAVING conditions on aggregations that do not appear in the SELECT clause.

ORDER BY

You can order on any column even if it is not in the result.

You can order on IIF or CASE-WHEN expressions.

Other Development Considerations

There are a few considerations to keep in mind as you design your data service and transformation.

You cannot JOIN one data service virtual table to another.

Pentaho Data Services uses the Memory Group by step to group. This step keeps all the groups in memory to avoid sorts that can slow down the data service. But, if you plan to use many groups, watch your memory consumption on the server to make sure you don't exceed its limits.

You can't specify the same field twice in the same SELECT clause.

Calculations and functions like string concatenation are not supported. But, you can do these things in the data service transformation.