Standard SQL Lexical Structure

A BigQuery statement comprises a series of tokens. Tokens include
identifiers,quoted identifiers, literals, keywords, operators, and
special characters. You can separate tokens with whitespace (space, backspace,
tab, newline) or comments.

Identifiers

Identifiers are names that are associated with columns, tables, and other
database objects.

Identifiers must begin with a letter or an underscore.
Subsequent characters can be letters, numbers, or underscores. Quoted
identifiers are identifiers enclosed by backtick (`) characters and can
contain any character, such as spaces or symbols. However, quoted identifiers
cannot be empty. Reserved Keywords can only be used as
identifiers if enclosed by backticks.

Syntax (defined here as a regular expression):

[A-Za-z_][A-Za-z_0-9]*

Examples:

Customers5
_dataField1
ADGROUP

Invalid examples:

5Customers
_dataField!
GROUP

5Customers begins with a number, not a letter or underscore. _dataField!
contains the special character "!" which is not a letter, number, or underscore.
GROUP is a reserved keyword, and therefore cannot be used as an identifier
without being enclosed by backtick characters.

Both identifiers and quoted identifiers are case insensitive, with some
nuances. See Case Sensitivity for further details.

Quoted identifiers have the same escape sequences as string literals,
defined below.

Literals

A literal represents a constant value of a built-in data type. Some, but not
all, data types can be expressed as literals.

String and Bytes Literals

Both string and bytes literals must be quoted, either with single (') or
double (") quotation marks, or triple-quoted with groups of three single
(''') or three double (""") quotation marks.

Quoted literals:

Literal

Examples

Description

Quoted string

"abc"

"it's"

'it\'s'

'Title: "Boy"'

Quoted strings enclosed by single (') quotes can contain unescaped double (") quotes, as well as the inverse. Backslashes (\) introduce escape sequences. See the Escape Sequences table below.Quoted strings cannot contain newlines, even when preceded by a backslash (\).

Triple-quoted string

"""abc"""

'''it's'''

'''Title:"Boy"'''

'''twolines'''

'''why\?'''

Embedded newlines and quotes are allowed without escaping - see fourth example.Backslashes (\) introduce escape sequences. See Escape Sequences table below.A trailing unescaped backslash (\) at the end of a line is not allowed.End the string with three unescaped quotes in a row that match the starting quotes.

Raw string

R"abc+"

r'''abc+'''

R"""abc+"""

r'f\(abc,(.*),def\)'

Quoted or triple-quoted literals that have the raw string literal prefix (r or R) are interpreted as raw/regex strings.Backslash characters (\) do not act as escape characters. If a backslash followed by another character occurs inside the string literal, both characters are preserved.A raw string cannot end with an odd number of backslashes.Raw strings are useful for constructing regular expressions.

Prefix characters (r, R, b, B) are optional for quoted or triple-quoted strings, and indicate that the string is a raw/regex string or a byte sequence, respectively. For
example, b'abc' and b'''abc''' are both interpreted as type bytes. Prefix characters are case insensitive.

Quoted literals with prefixes:

Literal

Example

Description

Bytes

B"abc"

B'''abc'''

b"""abc"""

Quoted or triple-quoted literals that have the bytes literal prefix (b or B) are interpreted as bytes.

Raw bytes

br'abc+'

RB"abc+"

RB'''abc'''

The r and b prefixes can be combined in any order. For example, rb'abc*' is equivalent to br'abc*'.

The table below lists all valid escape sequences for representing non-alphanumeric characters in string and byte literals.
Any sequence not in this table produces an error.

Unicode escape, with lowercase 'u' and exactly 4 hex digits. Valid only in string literals or identifiers.Note that the range D800-DFFF is not allowed, as these are surrogate unicode values.

\Uhhhhhhhh

Unicode escape, with uppercase 'U' and exactly 8 hex digits. Valid only in string literals or identifiers.The range D800-DFFF is not allowed, as these values are surrogate unicode values. Also, values greater than 10FFFF are not allowed.

Integer Literals

Integer literals are either a sequence of decimal digits (0–9) or a hexadecimal
value that is prefixed with "0x" or "0X". Integers can be prefixed by "+"
or "-" to represent positive and negative values, respectively.
Examples:

123
0xABC
-123

An integer literal is interpreted as an INT64.

NUMERIC Literals

You can construct NUMERIC literals using the
NUMERIC keyword followed by a floating point value in quotes.

Struct Literals

where elem is an element in the struct. elem must be a literal data type, not an expression or column name.

The output type is an anonymous struct type (structs are not named types) with anonymous fields with types
matching the types of the input expressions.

Example

Output Type

(1, 2, 3)

STRUCT<int64,int64,int64>

(1, 'abc')

STRUCT<int64,string>

Date Literals

Syntax:

DATE 'YYYY-M[M]-D[D]'

Date literals contain the DATE keyword followed by a string literal that conforms to the canonical date format, enclosed in single quotation marks. Date literals support a range between the
years 1 and 9999, inclusive. Dates outside of this range are invalid.

For example, the following date literal represents September 27, 2014:

DATE '2014-09-27'

String literals in canonical date format also implicitly coerce to DATE type
when used where a DATE-type expression is expected. For example, in the query

SELECT * FROM foo WHERE date_col = "2014-09-27"

the string literal "2014-09-27" will be coerced to a date literal.

Time Literals

Syntax:

TIME '[H]H:[M]M:[S]S[.DDDDDD]]'

TIME literals contain the TIME keyword and a string literal that conforms to
the canonical time format, enclosed in single quotation marks.

For example, the following time represents 12:30 p.m.:

TIME '12:30:00.45'

DATETIME Literals

Syntax:

DATETIME 'YYYY-[M]M-[D]D [[H]H:[M]M:[S]S[.DDDDDD]]'

DATETIME literals contain the DATETIME keyword and a string literal that
conforms to the canonical DATETIME format, enclosed in single quotation marks.

For example, the following DATETIME represents 12:30 p.m. on September 27,
2014:

DATETIME '2014-09-27 12:30:00.45'

DATETIME literals support a range between the years 1 and 9999, inclusive.
DATETIMEs outside of this range are invalid.

String literals with the canonical DATETIME format implicitly coerce to a
DATETIME literal when used where a DATETIME expression is expected.

For example:

SELECT * FROM foo
WHERE datetime_col = "2014-09-27 12:30:00.45"

In this query, the string literal "2014-09-27 12:30:00.45" is coerced to a
DATETIME literal.

Timestamp literals

Syntax:

TIMESTAMP 'YYYY-[M]M-[D]D [[H]H:[M]M:[S]S[.DDDDDD]] [timezone]'

Timestamp literals contain the TIMESTAMP keyword and a string literal that
conforms to the canonical timestamp format, enclosed in single quotation marks.

Timestamp literals support a range between the years 1 and 9999, inclusive.
Timestamps outside of this range are invalid.

A timestamp literal can include a numerical suffix to indicate the time zone:

TIMESTAMP '2014-09-27 12:30:00.45-08'

If this suffix is absent, the default time zone,
UTC, is used.

For example, the following timestamp represents 12:30 p.m. on September 27,
2014, using the UTC time zone:

String literals with the canonical timestamp format, including those with
time zone names, implicitly coerce to a timestamp literal when used where a
timestamp expression is expected. For example, in the following query, the
string literal "2014-09-27 12:30:00.45 America/Los_Angeles" is coerced
to a timestamp literal.

Time zone

Since timestamp literals must be mapped to a specific point in time, a time zone
is necessary to correctly interpret a literal. If a time zone is not specified
as part of the literal itself, then BigQuery uses the default time zone
value, which the BigQuery implementation sets.

BigQuery represents time zones using strings in the following canonical
format, which represents the offset from Coordinated Universal Time (UTC).

Format:

(+|-)H[H][:M[M]]

Examples:

'-08:00'
'-8:15'
'+3:00'
'+07:30'
'-7'

Time zones can also be expressed using string time zone names from the
tz database. For a less comprehensive but
simpler reference, see the
List of tz database time zones
on Wikipedia. Canonical time zone names have the format
<continent/[region/]city>, such as America/Los_Angeles.

Note: Not all time zone names are interchangeable even if they do happen to
report the same time during a given part of the year. For example, America/Los_Angeles reports the same time as UTC-7:00 during Daylight Savings Time, but reports the same time as UTC-8:00 outside of Daylight Savings Time.

Regular expression matching is case sensitive by default, unless the expression itself specifies that it should be case insensitive.

LIKE matching

Yes

Reserved Keywords

Keywords are a group of tokens that have special meaning in the BigQuery
language, and have the following characteristics:

Keywords cannot be used as identifiers unless enclosed by backtick (`) characters.

Keywords are case insensitive.

BigQuery has the following reserved keywords.

ALL
AND
ANY
ARRAY
AS
ASC
ASSERT_ROWS_MODIFIED
AT
BETWEEN
BY
CASE
CAST
COLLATE
CONTAINS
CREATE
CROSS
CUBE
CURRENT
DEFAULT
DEFINE
DESC
DISTINCT
ELSE
END

ENUM
ESCAPE
EXCEPT
EXCLUDE
EXISTS
EXTRACT
FALSE
FETCH
FOLLOWING
FOR
FROM
FULL
GROUP
GROUPING
GROUPS
HASH
HAVING
IF
IGNORE
IN
INNER
INTERSECT
INTERVAL
INTO

IS
JOIN
LATERAL
LEFT
LIKE
LIMIT
LOOKUP
MERGE
NATURAL
NEW
NO
NOT
NULL
NULLS
OF
ON
OR
ORDER
OUTER
OVER
PARTITION
PRECEDING
PROTO
RANGE

RECURSIVE
RESPECT
RIGHT
ROLLUP
ROWS
SELECT
SET
SOME
STRUCT
TABLESAMPLE
THEN
TO
TREAT
TRUE
UNBOUNDED
UNION
UNNEST
USING
WHEN
WHERE
WINDOW
WITH
WITHIN

Terminating Semicolons

You can optionally use a terminating semicolon (;) when you submit a query
string statement through an Application Programming Interface (API).

In a request containing multiple statements, you must separate statements with
semicolons, but the semicolon is generally optional after the final statement.
Some interactive tools require statements to have a terminating semicolon.

Comments

Comments are sequences of characters that the parser ignores.
BigQuery supports the following types of comments.

Single-line comments

Use a single-line comment if you want the comment to appear on a line by itself.

Examples

# this is a single-line comment
SELECT book FROM library;

-- this is a single-line comment
SELECT book FROM library;

/* this is a single-line comment */
SELECT book FROM library;

SELECT book FROM library
/* this is a single-line comment */
WHERE book = "Ulysses";

Inline comments

Use an inline comment if you want the comment to appear on the same line as
a statement. A comment that is prepended with # or -- must appear to the
right of a statement.

Examples

SELECT book FROM library; # this is an inline comment

SELECT book FROM library; -- this is an inline comment

SELECT book FROM library; /* this is an inline comment */

SELECT book FROM library /* this is an inline comment */ WHERE book = "Ulysses";

Multiline comments

Use a multiline comment if you need the comment to span multiple lines.
Nested multiline comments are not supported.

Examples

SELECT book FROM library
/*
This is a multiline comment
on multiple lines
*/
WHERE book = "Ulysses";

SELECT book FROM library
/* this is a multiline comment
on two lines */
WHERE book = "Ulysses";