For convenience Table.toString() automatically registers the table under a unique name in its TableEnvironment and returns the name. Hence, Table objects can be directly inlined into SQL queries (by string concatenation) as shown in the examples below.

Note: Flink’s SQL support is not yet feature complete. Queries that include unsupported SQL features cause a TableException. The supported features of SQL on batch and streaming tables are listed in the following sections.

Specifying a Query

The following examples show how to specify a SQL queries on registered and inlined tables.

StreamExecutionEnvironmentenv=StreamExecutionEnvironment.getExecutionEnvironment();StreamTableEnvironmenttableEnv=TableEnvironment.getTableEnvironment(env);// ingest a DataStream from an external sourceDataStream<Tuple3<Long,String,Integer>>ds=env.addSource(...);// SQL query with an inlined (unregistered) tableTabletable=tableEnv.fromDataStream(ds,"user, product, amount");Tableresult=tableEnv.sqlQuery("SELECT SUM(amount) FROM "+table+" WHERE product LIKE '%Rubber%'");// SQL query with a registered table// register the DataStream as table "Orders"tableEnv.registerDataStream("Orders",ds,"user, product, amount");// run a SQL query on the Table and retrieve the result as a new TableTableresult2=tableEnv.sqlQuery("SELECT product, amount FROM Orders WHERE product LIKE '%Rubber%'");// SQL update with a registered table// create and register a TableSinkTableSinkcsvSink=newCsvTableSink("/path/to/file",...);String[]fieldNames={"product","amount"};TypeInformation[]fieldTypes={Types.STRING,Types.INT};tableEnv.registerTableSink("RubberOrders",fieldNames,fieldTypes,csvSink);// run a SQL update query on the Table and emit the result to the TableSinktableEnv.sqlUpdate("INSERT INTO RubberOrders SELECT product, amount FROM Orders WHERE product LIKE '%Rubber%'");

valenv=StreamExecutionEnvironment.getExecutionEnvironmentvaltableEnv=TableEnvironment.getTableEnvironment(env)// read a DataStream from an external source
valds:DataStream[(Long, String, Integer)]=env.addSource(...)// SQL query with an inlined (unregistered) table
valtable=ds.toTable(tableEnv,'user,'product,'amount)valresult=tableEnv.sqlQuery(s"SELECT SUM(amount) FROM $table WHERE product LIKE '%Rubber%'")// SQL query with a registered table
// register the DataStream under the name "Orders"
tableEnv.registerDataStream("Orders",ds,'user,'product,'amount)// run a SQL query on the Table and retrieve the result as a new Table
valresult2=tableEnv.sqlQuery("SELECT product, amount FROM Orders WHERE product LIKE '%Rubber%'")// SQL update with a registered table
// create and register a TableSink
TableSinkcsvSink=newCsvTableSink("/path/to/file",...)valfieldNames:Array[String]=Array("product","amount")valfieldTypes:Array[TypeInformation[_]]=Array(Types.STRING,Types.INT)tableEnv.registerTableSink("RubberOrders",fieldNames,fieldTypes,csvSink)// run a SQL update query on the Table and emit the result to the TableSink
tableEnv.sqlUpdate("INSERT INTO RubberOrders SELECT product, amount FROM Orders WHERE product LIKE '%Rubber%'")

Supported Syntax

The following BNF-grammar describes the superset of supported SQL features in batch and streaming queries. The Operations section shows examples for the supported features and indicates which features are only supported for batch or streaming queries.

Note: All aggregates must be defined over the same window, i.e., same partitioning, sorting, and range. Currently, only windows with PRECEDING (UNBOUNDED and bounded) to CURRENT ROW range are supported. Ranges with FOLLOWING are not supported yet. ORDER BY must be specified on a single time attribute

Note: For streaming queries the required state to compute the query result might grow infinitely depending on the number of distinct fields. Please provide a query configuration with valid retention interval to prevent excessive state size. See Streaming Concepts for details.

Grouping sets, Rollup, CubeBatch

SELECTSUM(amount)FROMOrdersGROUPBYGROUPINGSETS((user),(product))

HavingBatchStreaming

SELECTSUM(amount)FROMOrdersGROUPBYusersHAVINGSUM(amount)>50

User-defined Aggregate Functions (UDAGG)BatchStreaming

UDAGGs must be registered in the TableEnvironment. See the UDF documentation for details on how to specify and register UDAGGs.

Joins

Currently, only equi-joins are supported, i.e., joins that have at least one conjunctive condition with an equality predicate. Arbitrary cross or theta joins are not supported.

Note: The order of joins is not optimized. Tables are joined in the order in which they are specified in the FROM clause. Make sure to specify tables in an order that does not yield a cross join (Cartesian product) which are not supported and would cause a query to fail.

SELECT*FROMOrdersINNERJOINProductONOrders.productId=Product.id

Note: For streaming queries the required state to compute the query result might grow infinitely depending on the number of distinct input rows. Please provide a query configuration with valid retention interval to prevent excessive state size. See Streaming Concepts for details.

Outer Equi-joinBatch

Currently, only equi-joins are supported, i.e., joins that have at least one conjunctive condition with an equality predicate. Arbitrary cross or theta joins are not supported.

Note: The order of joins is not optimized. Tables are joined in the order in which they are specified in the FROM clause. Make sure to specify tables in an order that does not yield a cross join (Cartesian product) which are not supported and would cause a query to fail.

Note: Time-windowed joins are a subset of regular joins that can be processed in a streaming fashion.

A time-windowed join requires at least one equi-join predicate and a join condition that bounds the time on both sides. Such a condition can be defined by two appropriate range predicates (<, <=, >=, >), a BETWEEN predicate, or a single equality predicate that compares time attributes of the same type (i.e., processing time or event time) of both input tables.

For example, the following predicates are valid window join conditions:

ltime = rtime

ltime >= rtime AND ltime < rtime + INTERVAL '10' MINUTE

ltime BETWEEN rtime - INTERVAL '10' SECOND AND rtime + INTERVAL '5' SECOND

Group Windows

Group windows are defined in the GROUP BY clause of a SQL query. Just like queries with regular GROUP BY clauses, queries with a GROUP BY clause that includes a group window function compute a single result row per group. The following group windows functions are supported for SQL on batch and streaming tables.

Group Window Function

Description

TUMBLE(time_attr, interval)

Defines a tumbling time window. A tumbling time window assigns rows to non-overlapping, continuous windows with a fixed duration (interval). For example, a tumbling window of 5 minutes groups rows in 5 minutes intervals. Tumbling windows can be defined on event-time (stream + batch) or processing-time (stream).

HOP(time_attr, interval, interval)

Defines a hopping time window (called sliding window in the Table API). A hopping time window has a fixed duration (second interval parameter) and hops by a specified hop interval (first interval parameter). If the hop interval is smaller than the window size, hopping windows are overlapping. Thus, rows can be assigned to multiple windows. For example, a hopping window of 15 minutes size and 5 minute hop interval assigns each row to 3 different windows of 15 minute size, which are evaluated in an interval of 5 minutes. Hopping windows can be defined on event-time (stream + batch) or processing-time (stream).

SESSION(time_attr, interval)

Defines a session time window. Session time windows do not have a fixed duration but their bounds are defined by a time interval of inactivity, i.e., a session window is closed if no event appears for a defined gap period. For example a session window with a 30 minute gap starts when a row is observed after 30 minutes inactivity (otherwise the row would be added to an existing window) and is closed if no row is added within 30 minutes. Session windows can work on event-time (stream + batch) or processing-time (stream).

Time Attributes

For SQL queries on streaming tables, the time_attr argument of the group window function must refer to a valid time attribute that specifies the processing time or event time of rows. See the documentation of time attributes to learn how to define time attributes.

For SQL on batch tables, the time_attr argument of the group window function must be an attribute of type TIMESTAMP.

Selecting Group Window Start and End Timestamps

The start and end timestamps of group windows as well as time attributes can be selected with the following auxiliary functions:

Data Types

The SQL runtime is built on top of Flink’s DataSet and DataStream APIs. Internally, it also uses Flink’s TypeInformation to define data types. Fully supported types are listed in org.apache.flink.table.api.Types. The following table summarizes the relation between SQL Types, Table API types, and the resulting Java class.

Table API

SQL

Java type

Types.STRING

VARCHAR

java.lang.String

Types.BOOLEAN

BOOLEAN

java.lang.Boolean

Types.BYTE

TINYINT

java.lang.Byte

Types.SHORT

SMALLINT

java.lang.Short

Types.INT

INTEGER, INT

java.lang.Integer

Types.LONG

BIGINT

java.lang.Long

Types.FLOAT

REAL, FLOAT

java.lang.Float

Types.DOUBLE

DOUBLE

java.lang.Double

Types.DECIMAL

DECIMAL

java.math.BigDecimal

Types.SQL_DATE

DATE

java.sql.Date

Types.SQL_TIME

TIME

java.sql.Time

Types.SQL_TIMESTAMP

TIMESTAMP(3)

java.sql.Timestamp

Types.INTERVAL_MONTHS

INTERVAL YEAR TO MONTH

java.lang.Integer

Types.INTERVAL_MILLIS

INTERVAL DAY TO SECOND(3)

java.lang.Long

Types.PRIMITIVE_ARRAY

ARRAY

e.g. int[]

Types.OBJECT_ARRAY

ARRAY

e.g. java.lang.Byte[]

Types.MAP

MAP

java.util.HashMap

Types.MULTISET

MULTISET

e.g. java.util.HashMap<String, Integer> for a multiset of String

Generic types and composite types (e.g., POJOs or Tuples) can be fields of a row as well. Generic types are treated as a black box and can be passed on or processed by user-defined functions. Composite types can be accessed with built-in functions (see Value access functions section).

Built-In Functions

Flink’s SQL support comes with a set of built-in functions for data transformations. This section gives a brief overview of the available functions.

The Flink SQL functions (including their syntax) are a subset of Apache Calcite’s built-in functions. Most of the documentation has been adopted from the Calcite SQL reference.

Comparison functions

Description

value1 = value2

Equals.

value1 <> value2

Not equal.

value1 > value2

Greater than.

value1 >= value2

Greater than or equal.

value1 < value2

Less than.

value1 <= value2

Less than or equal.

value IS NULL

Returns TRUE if value is null.

value IS NOT NULL

Returns TRUE if value is not null.

value1 IS DISTINCT FROM value2

Returns TRUE if two values are not equal, treating null values as the same.

value1 IS NOT DISTINCT FROM value2

Returns TRUE if two values are equal, treating null values as the same.

value1 BETWEEN [ASYMMETRIC | SYMMETRIC] value2 AND value3

Returns TRUE if value1 is greater than or equal to value2 and less than or equal to value3.

value1 NOT BETWEEN value2 AND value3

Returns TRUE if value1 is less than value2 or greater than value3.

string1 LIKE string2 [ ESCAPE string3 ]

Returns TRUE if string1 matches pattern string2. An escape character can be defined if necessary.

string1 NOT LIKE string2 [ ESCAPE string3 ]

Returns TRUE if string1 does not match pattern string2. An escape character can be defined if necessary.

string1 SIMILAR TO string2 [ ESCAPE string3 ]

Returns TRUE if string1 matches regular expression string2. An escape character can be defined if necessary.

string1 NOT SIMILAR TO string2 [ ESCAPE string3 ]

Returns TRUE if string1 does not match regular expression string2. An escape character can be defined if necessary.

value IN (value [, value]* )

Returns TRUE if an expression exists in a given list of expressions. This is a shorthand for multiple OR conditions. If the testing set contains NULL, the result will be NULL if the element can not be found and TRUE if it can be found. If the element is NULL, the result is always NULL. E.g. "42 IN (1, 2, 3)" leads to FALSE.

value NOT IN (value [, value]* )

Returns TRUE if value is not equal to every value in a list.

EXISTS (sub-query)

Returns TRUE if sub-query returns at least one row. Only supported if the operation can be rewritten in a join and group operation.

value IN (sub-query)

Returns TRUE if value is equal to a row returned by sub-query. This operation is not supported in a streaming environment yet.

value NOT IN (sub-query)

Returns TRUE if value is not equal to every row returned by sub-query. This operation is not supported in a streaming environment yet.

Logical functions

Description

boolean1 OR boolean2

Returns TRUE if boolean1 is TRUE or boolean2 is TRUE. Supports three-valued logic.

boolean1 AND boolean2

Returns TRUE if boolean1 and boolean2 are both TRUE. Supports three-valued logic.

NOT boolean

Returns TRUE if boolean is not TRUE; returns UNKNOWN if boolean is UNKNOWN.

boolean IS FALSE

Returns TRUE if boolean is FALSE; returns FALSE if boolean is UNKNOWN.

boolean IS NOT FALSE

Returns TRUE if boolean is not FALSE; returns TRUE if boolean is UNKNOWN.

boolean IS TRUE

Returns TRUE if boolean is TRUE; returns FALSE if boolean is UNKNOWN.

boolean IS NOT TRUE

Returns TRUE if boolean is not TRUE; returns TRUE if boolean is UNKNOWN.

boolean IS UNKNOWN

Returns TRUE if boolean is UNKNOWN.

boolean IS NOT UNKNOWN

Returns TRUE if boolean is not UNKNOWN.

Arithmetic functions

Description

+ numeric

Returns numeric.

- numeric

Returns negative numeric.

numeric1 + numeric2

Returns numeric1 plus numeric2.

numeric1 - numeric2

Returns numeric1 minus numeric2.

numeric1 * numeric2

Returns numeric1 multiplied by numeric2.

numeric1 / numeric2

Returns numeric1 divided by numeric2.

POWER(numeric1, numeric2)

Returns numeric1 raised to the power of numeric2.

ABS(numeric)

Returns the absolute value of numeric.

MOD(numeric1, numeric2)

Returns the remainder (modulus) of numeric1 divided by numeric2. The result is negative only if numeric1 is negative.

SQRT(numeric)

Returns the square root of numeric.

LN(numeric)

Returns the natural logarithm (base e) of numeric.

LOG10(numeric)

Returns the base 10 logarithm of numeric.

EXP(numeric)

Returns e raised to the power of numeric.

CEIL(numeric)
CEILING(numeric)

Rounds numeric up, and returns the smallest number that is greater than or equal to numeric.

FLOOR(numeric)

Rounds numeric down, and returns the largest number that is less than or equal to numeric.

Returns a pseudorandom double value between 0.0 (inclusive) and 1.0 (exclusive) with a initial seed. Two RAND functions will return identical sequences of numbers if they have same initial seed.

RAND_INTEGER(bound integer)

Returns a pseudorandom integer value between 0.0 (inclusive) and the specified value (exclusive).

RAND_INTEGER(seed integer, bound integer)

Returns a pseudorandom integer value between 0.0 (inclusive) and the specified value (exclusive) with a initial seed. Two RAND_INTEGER functions will return identical sequences of numbers if they have same initial seed and same bound.

LOG(x numeric), LOG(base numeric, x numeric)

Returns the natural logarithm of a specified number of a specified base. If called with one parameter, this function returns the natural logarithm of x. If called with two parameters, this function returns the logarithm of x to the base b. x must be greater than 0. b must be greater than 1.

Removes leading and/or trailing characters from string2. By default, whitespaces at both sides are removed.

OVERLAY(string1 PLACING string2 FROM integer [ FOR integer2 ])

Replaces a substring of string1 with string2.

SUBSTRING(string FROM integer)

Returns a substring of a character string starting at a given point.

SUBSTRING(string FROM integer FOR integer)

Returns a substring of a character string starting at a given point with a given length.

INITCAP(string)

Returns string with the first letter of each word converter to upper case and the rest to lower case. Words are sequences of alphanumeric characters separated by non-alphanumeric characters.

CONCAT(string1, string2,...)

Returns the string that results from concatenating the arguments. Returns NULL if any argument is NULL. E.g. CONCAT("AA", "BB", "CC") returns AABBCC.

CONCAT_WS(separator, string1, string2,...)

Returns the string that results from concatenating the arguments using a separator. The separator is added between the strings to be concatenated. Returns NULL If the separator is NULL. CONCAT_WS() does not skip empty strings. However, it does skip any NULL argument. E.g. CONCAT_WS("~", "AA", "BB", "", "CC") returns AA~BB~~CC

LPAD(text string, len integer, pad string)

Returns the string text left-padded with the string pad to a length of len characters. If text is longer than len, the return value is shortened to len characters. E.g. LPAD('hi',4,'??') returns ??hi, LPAD('hi',1,'??') returns h.

RPAD(text string, len integer, pad string)

Returns the string text right-padded with the string pad to a length of len characters. If text is longer than len, the return value is shortened to len characters. E.g. RPAD('hi',4,'??') returns hi??, RPAD('hi',1,'??') returns h.

Provides a value if the first value is null. For example, COALESCE(NULL, 5) returns 5.

Type conversion functions

Description

CAST(value AS type)

Converts a value to a given type.

Temporal functions

Description

DATE string

Parses a date string in the form "yy-mm-dd" to a SQL date.

TIME string

Parses a time string in the form "hh:mm:ss" to a SQL time.

TIMESTAMP string

Parses a timestamp string in the form "yy-mm-dd hh:mm:ss.fff" to a SQL timestamp.

INTERVAL string range

Parses an interval string in the form "dd hh:mm:ss.fff" for SQL intervals of milliseconds or "yyyy-mm" for SQL intervals of months. An interval range might be e.g. DAY, MINUTE, DAY TO HOUR, or DAY TO SECOND for intervals of milliseconds; YEAR or YEAR TO MONTH for intervals of months. E.g. INTERVAL '10 00:00:00.004' DAY TO SECOND, INTERVAL '10' DAY, or INTERVAL '2-10' YEAR TO MONTH return intervals.

CURRENT_DATE

Returns the current SQL date in UTC time zone.

CURRENT_TIME

Returns the current SQL time in UTC time zone.

CURRENT_TIMESTAMP

Returns the current SQL timestamp in UTC time zone.

LOCALTIME

Returns the current SQL time in local time zone.

LOCALTIMESTAMP

Returns the current SQL timestamp in local time zone.

EXTRACT(timeintervalunit FROM temporal)

Extracts parts of a time point or time interval. Returns the part as a long value. E.g. EXTRACT(DAY FROM DATE '2006-06-05') leads to 5.

FLOOR(timepoint TO timeintervalunit)

Rounds a time point down to the given unit. E.g. FLOOR(TIME '12:44:31' TO MINUTE) leads to 12:44:00.

CEIL(timepoint TO timeintervalunit)

Rounds a time point up to the given unit. E.g. CEIL(TIME '12:44:31' TO MINUTE) leads to 12:45:00.

QUARTER(date)

Returns the quarter of a year from a SQL date. E.g. QUARTER(DATE '1994-09-27') leads to 3.

Formats timestamp as a string using a specified format string. The format must be compatible with MySQL's date formatting syntax as used by the date_parse function. The format specification is given in the Date Format Specifier table below.

Adds a (signed) integer interval to a timestamp. The unit for the interval is given by the unit argument, which should be one of the following values: SECOND, MINUTE, HOUR, DAY, WEEK, MONTH, QUARTER, or YEAR. E.g. TIMESTAMPADD(WEEK, 1, '2003-01-02') leads to 2003-01-09.

Aggregate functions

Description

COUNT(value [, value]* )

Returns the number of input rows for which value is not null.

COUNT(*)

Returns the number of input rows.

AVG(numeric)

Returns the average (arithmetic mean) of numeric across all input values.

SUM(numeric)

Returns the sum of numeric across all input values.

MAX(value)

Returns the maximum value of value across all input values.

MIN(value)

Returns the minimum value of value across all input values.

STDDEV_POP(value)

Returns the population standard deviation of the numeric field across all input values.

STDDEV_SAMP(value)

Returns the sample standard deviation of the numeric field across all input values.

VAR_POP(value)

Returns the population variance (square of the population standard deviation) of the numeric field across all input values.

VAR_SAMP(value)

Returns the sample variance (square of the sample standard deviation) of the numeric field across all input values.

COLLECT(value)

Returns a multiset of the values. null input value will be ignored. Return an empty multiset if only null values are added.

Grouping functions

Description

GROUP_ID()

Returns an integer that uniquely identifies the combination of grouping keys.

GROUPING(expression)

Returns 1 if expression is rolled up in the current row’s grouping set, 0 otherwise.

GROUPING_ID(expression [, expression]* )

Returns a bit vector of the given grouping expressions.

Value access functions

Description

tableName.compositeType.field

Accesses the field of a Flink composite type (such as Tuple, POJO, etc.) by name and returns it's value.

tableName.compositeType.*

Converts a Flink composite type (such as Tuple, POJO, etc.) and all of its direct subtypes into a flat representation where every subtype is a separate field.

Value constructor functions

Description

(value, [, value]*)

Creates a row from a list of values.

ROW(value, [, value]*)

Creates a row from a list of values.

ARRAY ‘[’ value [, value ]* ‘]’

Creates an array from a list of values.

MAP ‘[’ key, value [, key, value ]* ‘]’

Creates a map from a list of key-value pairs.

Array functions

Description

CARDINALITY(ARRAY)

Returns the number of elements of an array.

array ‘[’ index ‘]’

Returns the element at a particular position in an array. The index starts at 1.

ELEMENT(ARRAY)

Returns the sole element of an array with a single element. Returns null if the array is empty. Throws an exception if the array has more than one element.

Map functions

Description

CARDINALITY(MAP)

Returns the number of entries of a map.

map ‘[’ key ‘]’

Returns the value specified by a particular key in a map.

Hash functions

Description

MD5(string)

Returns the MD5 hash of the string argument as a string of 32 hexadecimal digits; null if string is null.

SHA1(string)

Returns the SHA-1 hash of the string argument as a string of 40 hexadecimal digits; null if string is null.

SHA256(string)

Returns the SHA-256 hash of the string argument as a string of 64 hexadecimal digits; null if string is null.

Unsupported Functions

Reserved Keywords

Although not every SQL feature is implemented yet, some string combinations are already reserved as keywords for future use. If you want to use one of the following strings as a field name, make sure to surround them with backticks (e.g. `value`, `count`).