Query Reference

BigQuery queries are written using a
variation of the standard SQL SELECT statement. BigQuery supports a wide
variety of functions such as COUNT, arithmetic expressions, and string
functions. This document details BigQuery's query syntax and functions.

SELECT clause

SELECT expr1 [[AS] alias1], expr2 [[AS] alias2], ...

The SELECT clause specifies the set of values to be returned by a
query. Expressions (expr1, etc.) in the SELECT clause can be
field names, literals, or functional expressions that operate on fields or
literals. Expressions must be comma-separated.

The SELECT clause supports an AS section, which defines an alias for the field, literal, or functional expression. You can refer to an alias only in GROUP BY and ORDER BY clauses.

Notes:

If you use an aggregation function on any result, such
as COUNT, you must use the GROUP BY clause to
group all non-aggregated fields. For example:

FROM clause

project_name is an optional name of the project that contains the
dataset. If not specified, the query will assume it is in the current project.
Note: if your project name includes a dash, you must surround
the entire table reference with brackets, as
in [my-dashed-project:dataset1.tableName].

subselect_clause is a nested SELECT clause. The subquery is evaluated and subquery results are treated just like a table. The result of the subquery must contain the appropriate columns and data required by the containing SELECT statement. If the query includes multiple subqueries or a combination of subqueries and tables, all of the tables and subqueries must contain all of the fields in the SELECT clause of the main query. The rules that apply to queries also apply to subqueries, except that subqueries do not end in a semicolon.

table wildcard function refers to a supported table wildcard function, such as TABLE_DATE_RANGE to query only a specific set of daily tables.

alias is primarily used in JOIN statements, where you
may need to provide a name to a subselect clause. You should only refer to a
field as alias.field in order to disambiguate between the two
tables involved in a JOIN. Do not use alias with a table wildcard function.

The table name must be qualified with the dataset and project IDs, unless
you specify default dataset or project IDs as part of the query request.

Note: Unlike many other SQL-based systems, BigQuery uses
the comma syntax to indicate table unions, not joins. This means you can run a
query over several tables with compatible schemas as follows:

// Find suspicious activity over several days
SELECT FORMAT_UTC_USEC(event.timestamp_in_usec) AS time, request_url
FROM [applogs.events_20120501], [applogs.events_20120502], [applogs.events_20120503]
WHERE event.username = 'root' AND NOT event.source_ip.is_internal;

Queries that perform a union over large number of tables can expect to run more slowly than the same query over a single table with the same amount of data. The difference in performance can be up to 50 ms per additional table. The maximum number of tables you can perform a union on is 1,000.

FLATTEN clause

FLATTEN converts a repeated field into an optional field. Given one record with many values for a repeated field, FLATTEN unrolls it into many records, one record for each value of the (formerly) repeated field; any non-repeated fields become duplicated to fill out each of the new records formed. FLATTEN removes one level of nesting.

For more information and examples, see FLATTEN in the developer's guide.

JOIN clause

BigQuery supports multiple JOIN operations in each SELECT statement. BigQuery executes multiple JOIN operations pairwise, starting with the first pair of inputs after the FROM keyword. Subsequent JOIN operations use the results of the previous JOIN operation as the left JOIN input. Fields from any preceding JOIN input can be used as a JOIN key in the ON clause of a subsequent JOIN operation.

JOIN types

CROSS JOIN clauses must not contain an ON clause. CROSS JOIN operations can return a large amount of data and might result in a slow and inefficient query. When possible, use regular JOIN instead.

EACH modifier

Normal JOIN operations require that the right-side table contains less than 8 MB of compressed data. The EACH modifier is a hint that informs the query execution engine that the JOIN might reference two large tables. The EACH modifier can't be used in CROSS JOIN clauses.

When possible, use JOIN without the EACH modifier for best performance. Use JOIN EACH when table sizes are too large for JOIN.

WHERE clause

... WHERE condition ...

The WHERE clause, sometimes called the predicate, states the qualifying conditions for a query. Multiple conditions can be joined by boolean AND and OR clauses, optionally surrounded by (parentheses) to group them. The fields listed in a WHERE clause do not need to be listed in any SELECT clause.

Note: Aggregate functions cannot be used in the WHERE clause. Use HAVING if you need to use aggregate fields.

ExampleThe following example includes two clauses joined by an OR; either one must be satisfied for the row to be returned.

GROUP BY clause

... GROUP [EACH] BY [ROLLUP] (field1|alias1), (field2|alias2) ...

The GROUP BY clause allows you to group rows that have the same values for a given field. You can then perform aggregate functions on each of the groups. Grouping occurs after any selection or aggregation in the SELECT clause.

For example you can group the rows that have the same value for f1 and find the SUM of the f2 values for each group:

SELECT f1, SUM(f2)
FROM ds.Table
GROUP BY f1;

This type of aggregation is called group aggregation. Unlike scoped aggregation, group aggregation is supported by traditional relational databases.

The EACH parameter can be used when your dataset contains a large number of distinct values for the group keys. Use GROUP BY without the EACH parameter when possible to improve query performance.

When the ROLLUP modifier is used, BigQuery adds extra rows to the query result that represent partial aggregations. All fields listed after GROUP BY must be enclosed in a single set of parentheses. For rows added by the ROLLUP modifier, NULL indicates that the column is a partial aggregation.

Example:

SELECT
year,
is_male,
COUNT(1) as count
FROM
publicdata:samples.natality
WHERE
year >= 2000
AND year <= 2002
GROUP BY ROLLUP(year, is_male)
ORDER BY year, is_male

The GROUPING modifier allows you to distinguish between rows that BigQuery added because of the ROLLUP modifier, versus rows that have a NULL value.

Example:

SELECT
year,
GROUPING(year) as rollup_year,
is_male,
GROUPING(is_male) as rollup_gender,
COUNT(1) as count
FROM
publicdata:samples.natality
WHERE
year >= 2000
AND year <= 2002
GROUP BY ROLLUP(year, is_male)
ORDER BY year, is_male

HAVING clause

... HAVING condition ...

The HAVING clause states the qualifying conditions for a query. Multiple conditions can be joined by boolean AND and OR clauses, optionally surrounded by (parentheses) to group them. HAVING is similar to WHERE, but it supports aggregate fields.

Note that the HAVING clause can only refer to fields defined in your SELECT clause (if the field has an alias, you must use it; if it doesn't, use the aggregate field name instead).

Example

SELECT keyword, SUM(clicks)/SUM(impressions) ctr FROM ads
WHERE impressions > 100
GROUP BY keyword
HAVING ctr > 0.1;
SELECT foo, SUM(bar) as boo FROM myTable GROUP BY foo HAVING boo>0;

ORDER BY clause

The ORDER BY clause sorts the results of a query in ascending or descending order of one or more fields. Use DESC (descending) or ASC (ascending) to specify the sort direction. ASC is the default.

You can sort by field names or by aliases from the SELECT clause. To sort by multiple fields or aliases, enter them as a comma-separated list. The results are sorted on the fields in the order in which they are listed.

Since a query can operate over very large number of rows, LIMIT is a good way to avoid long-running queries for simple queries where representative data is sufficient.

Notes:

The LIMIT clause will stop processing and return results when it satisfies your requirements. This can reduce processing time for some queries, but when you specify aggregate functions such as COUNT or ORDER BY clauses, the full result set must still be processed before returning results.

The same LIMIT clause can return different results when it shortcuts processing the full result set. This is because queries are handled by parallel processing, and the order in which parallel jobs return is not guaranteed.

The LIMIT clause cannot contain any functions; it takes only numeric constants.

Supported functions and operators

Most SELECT statement clauses support functions. Fields referenced in a function don't need to be listed in any SELECT clause. Therefore, the following query is valid, even though the clicks field is not displayed directly: SELECT country, SUM(clicks) FROM table GROUP BY country;

Aggregate functions

Aggregate functions return values that represent summaries of larger sets of data, which makes these functions particularly useful for analyzing logs. An aggregate function operates against a collection of values and returns a single value per table, group, or scope:

Table aggregation

Uses an aggregate function to summarize all qualifying rows in the table. For example:

SELECT COUNT(f1) FROM ds.Table;

Group aggregation

Uses an aggregate function and a GROUP BY clause that specifies a non-aggregated field to summarize rows by group. For example:

This feature applies only to tables that have nested fields.Uses an aggregate function and the WITHIN keyword to aggregate repeated values within a defined scope. For example:

SELECT COUNT(m1.f2) WITHIN RECORD FROM Table;

The scope can be RECORD, which corresponds to entire row, or a node (repeated field in a row). Aggregation functions operate over the values within the scope and return aggregated results for each record or node.

You can apply a restriction to an aggregate function using one of the following options:

An alias in a subselect query. The restriction is specified in the outer WHERE clause.

SELECT corpus, count_corpus_words
FROM
(SELECT corpus, count(word) AS count_corpus_words
FROM publicdata:samples.shakespeare
GROUP BY corpus) AS sub_shakespeare
WHERE count_corpus_words > 4000

SELECT corpus, count(word) AS count_corpus_words
FROM publicdata:samples.shakespeare
GROUP BY corpus
HAVING count_corpus_words > 4000;

You can also refer to an alias in the GROUP BY or ORDER BY clauses.

Syntax

Function

Description

AVG(numeric_expr)

Returns the average of the values for a group of rows computed by numeric_expr. Rows with a NULL value are not included in the calculation.

BIT_AND(numeric_expr)

Returns the result of a bitwise AND operation between each instance of numeric_expr across all rows. NULL values are ignored. This function returns NULL if all instances of numeric_expr evaluate to NULL.

BIT_OR(numeric_expr)

Returns the result of a bitwise OR operation between each instance of numeric_expr across all rows. NULL values are ignored. This function returns NULL if all instances of numeric_expr evaluate to NULL.

BIT_XOR(numeric_expr)

Returns the result of a bitwise XOR operation between each instance of numeric_expr across all rows. NULL values are ignored. This function returns NULL if all instances of numeric_expr evaluate to NULL.

Returns the total number of values (NULL and non-NULL) in the scope of the function. Unless you are using COUNT(*) with the TOP function, it is better to explicitly specify the field to count.

COUNT([DISTINCT] field [, n])

Returns the total number of non-NULL values in the scope of the function.

If you use the DISTINCT keyword, the function returns the number of distinct values for the specified field. Note that the returned value for DISTINCT is a statistical approximation and is not guaranteed to be exact.

If you require greater accuracy from COUNT(DISTINCT), you can specify a second parameter, n, which gives the threshold below which exact results are guaranteed. By default, n is 1000, but if you give a larger n, you will get exact results for COUNT(DISTINCT) up to that value of n. However, giving larger values of n will reduce scalability of this operator and may substantially increase query execution time or cause the query to fail.

To compute the exact number of distinct values, use EXACT_COUNT_DISTINCT. Or, for a more scalable approach, consider using GROUP EACH BY on the relevant field(s) and then applying COUNT(*). The GROUP EACH BY approach is more scalable but might incur a slight up-front performance penalty.

COVAR_POP(numeric_expr1, numeric_expr2)

Computes the population covariance of the values computed by numeric_expr1 and numeric_expr2.

COVAR_SAMP(numeric_expr1, numeric_expr2)

Computes the sample covariance of the values computed by numeric_expr1 and numeric_expr2.

EXACT_COUNT_DISTINCT(field)

Returns the exact number of non-NULL, distinct values for the specified field. For better scalability and performance, use COUNT(DISTINCT field).

FIRST(expr)

Returns the first sequential value in the scope of the function.

GROUP_CONCAT('str' [, separator])

Concatenates multiple strings into a single string, where each value is separated by the optional separator parameter. If separator is omitted, BigQuery returns a comma-separated string.

If a string in the source data contains a double quote character, GROUP_CONCAT returns the string with double quotes added. For example, the string a"b would return as "a""b". Use GROUP_CONCAT_UNQUOTED if you prefer that these strings do not return with double quotes added.

Aggregates all values in the current aggregation scope into a repeated field. For example, the query "SELECT x, NEST(y) FROM ... GROUP BY x" returns one output record for each distinct x value, and contains a repeated field for all y values paired with x in the query input. The NEST function requires a GROUP BY clause.

BigQuery automatically flattens query results, so if you use the NEST function on the top level query, the results won't contain repeated fields. Use the NEST function when using a subselect that produces intermediate results for immediate use by the same query.

NTH(n, field)

Returns the nth sequential value in the scope of the function, where n is a constant. The NTH function starts counting at 1, so there is no zeroth term. If the scope of the function has less than n values, the function returns NULL.

QUANTILES(expr[, buckets])

Compares approximate quantiles for the input expression. The number of quantiles computed is controlled with the optional buckets parameter. The default value of buckets is 100. If specified explicitly, buckets must be greater than or equal to 2. The fractional error per quantile is epsilon = 1 / buckets.

STDDEV(numeric_expr)

Returns the standard deviation of the values computed by numeric_expr. Rows with a NULL value are not included in the calculation. The STDDEV function is an alias for STDDEV_SAMP.

STDDEV_POP(numeric_expr)

Computes the population standard deviation of the value computed by numeric_expr. For more information about population versus sample standard deviation, see Standard deviation on Wikipedia.

STDDEV_SAMP(numeric_expr)

Computes the sample standard deviation of the value computed by numeric_expr. For more information about population versus sample standard deviation, see Standard deviation on Wikipedia.

SUM(field)

Returns the sum total of the values in the scope of the function. For use with numerical data types only.

Returns the set of unique, non-NULL values in the scope of the function in an undefined order. Similar to a large GROUP BY clause without the EACH keyword, the query will fail with a "Resources Exceeded" error if there are too many distinct values. Unlike GROUP BY, however, the UNIQUE function can be applied with scoped aggregation, allowing efficient operation on nested fields with a limited number of values.

VARIANCE(numeric_expr)

Computes the variance of the values computed by numeric_expr. Rows with a NULL value are not included in the calculation. The VARIANCE function is an alias for VAR_SAMP.

VAR_POP(numeric_expr)

Computes the population variance of the values computed by numeric_expr. For more information about population versus sample standard deviation, see Standard deviation on Wikipedia.

VAR_SAMP(numeric_expr)

Computes the sample variance of the values computed by numeric_expr. For more information about population versus sample standard deviation, see Standard deviation on Wikipedia.

TOP() function

TOP is a function that is an alternative to the GROUP BY clause. It is used as simplified syntax for GROUP BY ... ORDER BY ... LIMIT .... Generally, the TOP function performs faster than the full ... GROUP BY ... ORDER BY ... LIMIT ... query, but may only return approximate results. The following is the syntax for the TOP function:

TOP(field|alias[, max_values][,multiplier]) ... COUNT(*)

When using TOP in a SELECT clause, you must include COUNT(*) as one of the fields.

A query that uses the TOP() function can only return two fields: the TOP field, and the COUNT(*) value.

field|alias

The field or alias to return.

max_values

[Optional] The maximum number of results to return. Default is 20.

multiplier

A numeric constant, expression, or field that is multiplied with max_values to specify how many results to return.

The following queries compare using TOP() versus using GROUP BY...ORDER BY...LIMIT. The query returns, in order, the top 10 most frequently used words containing "th", and the number of documents the words was used in. The TOP query will execute much faster:

SELECT word, COUNT(*) AS cnt FROM ds.Table
WHERE word CONTAINS 'th' GROUP BY word ORDER BY cnt DESC LIMIT 10;
SELECT TOP(word, 10), COUNT(*) FROM ds.Table WHERE word contains 'th';

Note: You must include COUNT(*) in the SELECT clause to use TOP.

Advanced examples

Scenario

Description

Example

Average and standard deviation grouped by condition

The following query returns the average and standard deviation of birth weights in Ohio in 2003, grouped by mothers who do and do not smoke.

SELECT
cigarette_use,
/* Finds average and standard deviation */
AVG(weight_pounds) baby_weight,
STDDEV(weight_pounds) baby_weight_stdev,
AVG(mother_age) mother_age
FROM
[publicdata:samples.natality]
WHERE
year=2003 AND state='OH'
/* Group the result values by those */
/* who smoked and those who didn't. */
GROUP BY
cigarette_use;

Filter query results using an aggregated value

In order to filter query results using an aggregated value (for example, filtering by the value of a SUM), use the HAVING function. HAVING compares a value to a result determined by an aggregation function, as opposed to WHERE, which operates on each row prior to
aggregation.

SELECT
state,
/* If 'is_male' is True, return 'Male', */
/* otherwise return 'Female' */
IF (is_male, 'Male', 'Female') AS sex,
/* The count value is aliased as 'cnt' */
/* and used in the HAVING clause below. */
COUNT(*) AS cnt
FROM
[publicdata:samples.natality]
WHERE
state != ''
GROUP BY
state, sex
HAVING
cnt > 3000000
ORDER BY
cnt DESC

Arithmetic operators

Arithmetic operators take numeric arguments and return a numeric result. Each argument can be a numeric literal or a numeric value returned by a query. If the arithmetic operation evaluates to an undefined result, the operation returns NULL.

Comparison functions

Comparison functions return true or false, based on the following types of comparisons:

A comparison of two expressions.

A comparison of an expression or set of expressions to a specific criteria, such as being in a specified list, being NULL, or being a non-default optional value.

Some of the functions listed below return values other than true or false, but the values they return are based on comparison operations.

You can use either numeric or string expressions as arguments for comparison functions. (String constants must be enclosed in single or double quotes.) The expressions can be literals or values fetched by a query. Comparison functions are most often used as filtering conditions in WHERE clauses, but they can be used in other clauses.

Syntax

Function

Description

expr1 = expr2

Returns true if the expressions are equal.

expr1 != expr2expr1 <> expr2

Returns true if the expressions are not equal.

expr1 > expr2

Returns true if expr1 is greater than expr2.

expr1 < expr2

Returns true if expr1 is less than expr2.

expr1 >= expr2

Returns true if expr1 is greater than or equal to expr2.

expr1 <= expr2

Returns true if expr1 is less than or equal to expr2.

expr1 BETWEEN expr2 AND expr3

Returns true if the value of expr1 is greater than or equal to expr2, and less than or equal to expr3.

expr IS NULL

Returns true if expr is NULL.

expr IN(expr1, expr2, ...)

Returns true if expr matches expr1, expr2, or any value in the parentheses. The IN keyword is an efficient shorthand for (expr = expr1 || expr = expr2 || ...). The expressions used with the IN keyword must be constants and they must match the data type of expr.

COALESCE(<expr1>, <expr2>, ...)

Returns the first argument that isn't NULL.

GREATEST(numeric_expr1, numeric_expr2, ...)

Returns the largest numeric_expr parameter. All parameters must be numeric, and all parameters must be the same type. If any parameter is NULL, this function returns NULL.

To ignore NULL values, use the IFNULL function to change NULL values to a value that doesn't affect the comparison. In the following code example, the IFNULL function is used to change NULL values to -1, which doesn't affect the comparison between positive numbers.

Date and time functions

The following functions enable date and time manipulation for UNIX timestamps, date strings and TIMESTAMP data types. For more information about working with the TIMESTAMP data type, see Using TIMESTAMP.

Date and time functions that work with UNIX timestamps operate on UNIX time. Date and time functions return values based upon the UTC time zone.

Syntax

Function

Description

Example

CURRENT_DATE()

Returns a human-readable string of the current date in the format %Y-%m-%d.

SELECT CURRENT_DATE();

Returns: 2013-02-01

CURRENT_TIME()

Returns a human-readable string of the server's current time in the format %H:%M:%S.

SELECT CURRENT_TIME();

Returns: 01:32:56

CURRENT_TIMESTAMP()

Returns a TIMESTAMP data type of the server's current time in the format %Y-%m-%d %H:%M:%S.

SELECT CURRENT_TIMESTAMP();

Returns: 2013-02-01 01:33:35 UTC

DATE(<timestamp>)

Returns a human-readable string of a TIMESTAMP data type in the format %Y-%m-%d.

SELECT DATE(TIMESTAMP('2012-10-01 02:03:04'));

Returns: 2012-10-01

DATE_ADD(<timestamp>,<interval>,<interval_units>)

Adds the specified interval to a TIMESTAMP data type. Possible interval_units values include YEAR, MONTH, DAY, HOUR, MINUTE, and SECOND. If interval is a negative number, the interval is subtracted from the TIMESTAMP data type.

Returns the day of the month of a TIMESTAMP data type as an integer between 1 and 31, inclusively.

SELECT DAY(TIMESTAMP('2012-10-02 05:23:48'));

Returns: 2

DAYOFWEEK(<timestamp>)

Returns the day of the week of a TIMESTAMP data type as an integer between 1 (Sunday) and 7 (Saturday), inclusively.

SELECT DAYOFWEEK(TIMESTAMP("2012-10-01 02:03:04"));

Returns: 2

DAYOFYEAR(<timestamp>)

Returns the day of the year of a TIMESTAMP data type as an integer between 1 and 366, inclusively. The integer 1 refers to January 1.

SELECT DAYOFYEAR(TIMESTAMP("2012-10-01 02:03:04"));

Returns: 275

FORMAT_UTC_USEC(<unix_timestamp>)

Returns a human-readable string representation of a UNIX timestamp in the format YYYY-MM-DD HH:MM:SS.uuuuuu.

SELECT FORMAT_UTC_USEC(1274259481071200);

Returns: 2010-05-19 08:58:01.071200

HOUR(<timestamp>)

Returns the hour of a TIMESTAMP data type as an integer between 0 and 23, inclusively.

SELECT HOUR(TIMESTAMP('2012-10-02 05:23:48'));

Returns: 5

MINUTE(<timestamp>)

Returns the minutes of a TIMESTAMP data type as an integer between 0 and 59, inclusively.

SELECT MINUTE(TIMESTAMP('2012-10-02 05:23:48'));

Returns: 23

MONTH(<timestamp>)

Returns the month of a TIMESTAMP data type as an integer between 1 and 12, inclusively.

SELECT MONTH(TIMESTAMP('2012-10-02 05:23:48'));

Returns: 10

MSEC_TO_TIMESTAMP(<expr>)

Converts a UNIX timestamp in milliseconds to a TIMESTAMP data type.

SELECT MSEC_TO_TIMESTAMP(1349053323000);

Returns: 2012-10-01 01:02:03 UTC

SELECT MSEC_TO_TIMESTAMP(1349053323000 + 1000)

Returns: 2012-10-01 01:02:04 UTC

NOW()

Returns a UNIX timestamp in microseconds.

SELECT NOW();

Returns: 1359685811687920

PARSE_UTC_USEC(<date_string>)

Converts a date string to a UNIX timestamp in microseconds. date_string must have the format YYYY-MM-DD HH:MM:SS[.uuuuuu]. The fractional part of the second can be up to 6 digits long or can be omitted.

TIMESTAMP_TO_USEC is an equivalent function that converts a TIMESTAMP data type argument instead of a date string.

SELECT PARSE_UTC_USEC("2012-10-01 02:03:04");

Returns: 1349056984000000

QUARTER(<timestamp>)

Returns the quarter of the year of a TIMESTAMP data type as an integer between 1 and 4, inclusively.

SELECT QUARTER(TIMESTAMP("2012-10-01 02:03:04"));

Returns: 4

SEC_TO_TIMESTAMP(<expr>)

Converts a UNIX timestamp in seconds to a TIMESTAMP data type.

SELECT SEC_TO_TIMESTAMP(1355968987);

Returns: 2012-12-20 02:03:07 UTC

SELECT SEC_TO_TIMESTAMP(INTEGER(1355968984 + 3));

Returns: 2012-12-20 02:03:07 UTC

SECOND(<timestamp>)

Returns the seconds of a TIMESTAMP data type as an integer between 0 and 59, inclusively.

During a leap second, the integer range is between 0 and 60, inclusively.

SELECT SECOND(TIMESTAMP('2012-10-02 05:23:48'));

Returns: 48

STRFTIME_UTC_USEC(<unix_timestamp>,<date_format_str>)

Returns a human-readable date string in the format date_format_str. date_format_str can include date-related punctuation characters (such as / and -) and special characters accepted by the strftime function in C++ (such as %d for day of month).

Use the UTC_USEC_TO_<function_name> functions if you plan to group query data by time intervals, such as getting all data for a certain month, because the functions are more efficient.

SELECT STRFTIME_UTC_USEC(1274259481071200, "%Y-%m-%d");

Returns: 2010-05-19

TIME(<timestamp>)

Returns a human-readable string of a TIMESTAMP data type, in the format %H:%M:%S.

SELECT TIME(TIMESTAMP('2012-10-01 02:03:04'));

Returns: 02:03:04

TIMESTAMP(<date_string>)

Convert a date string to a TIMESTAMP data type.

SELECT TIMESTAMP("2012-10-01 01:02:03");

Returns: 2012-10-01 01:02:03 UTC

TIMESTAMP_TO_MSEC(<timestamp>)

Converts a TIMESTAMP data type to a UNIX timestamp in milliseconds.

SELECT TIMESTAMP_TO_MSEC(TIMESTAMP("2012-10-01 01:02:03"));

Returns: 1349053323000

TIMESTAMP_TO_SEC(<timestamp>)

Converts a TIMESTAMP data type to a UNIX timestamp in seconds.

SELECT TIMESTAMP_TO_SEC(TIMESTAMP("2012-10-01 01:02:03"));

Returns: 1349053323

TIMESTAMP_TO_USEC(<timestamp>)

Converts a TIMESTAMP data type to a UNIX timestamp in microseconds.

PARSE_UTC_USEC is an equivalent function that converts a data string argument instead of a TIMESTAMP data type.

SELECT TIMESTAMP_TO_USEC(TIMESTAMP("2012-10-01 01:02:03"));

Returns: 1349053323000000

USEC_TO_TIMESTAMP(<expr>)

Converts a UNIX timestamp in microseconds to a TIMESTAMP data type.

SELECT USEC_TO_TIMESTAMP(1349053323000000);

Returns: 2012-10-01 01:02:03 UTC

SELECT USEC_TO_TIMESTAMP(1349053323000000 + 1000000)

Returns: 2012-10-01 01:02:04 UTC

UTC_USEC_TO_DAY(<unix_timestamp>)

Shifts a UNIX timestamp in microseconds to the beginning of the day it occurs in.

For example, if unix_timestamp occurs on May 19th at 08:58, this function returns a UNIX timestamp for May 19th at 00:00 (midnight).

SELECT UTC_USEC_TO_DAY(1274259481071200);

Returns: 1274227200000000

UTC_USEC_TO_HOUR(<unix_timestamp>)

Shifts a UNIX timestamp in microseconds to the beginning of the hour it occurs in.

For example, if unix_timestamp occurs at 08:58, this function returns a UNIX timestamp for 08:00 on the same day.

SELECT UTC_USEC_TO_HOUR(1274259481071200);

Returns: 1274256000000000

UTC_USEC_TO_MONTH(<unix_timestamp>)

Shifts a UNIX timestamp in microseconds to the beginning of the month it occurs in.

For example, if unix_timestamp occurs on March 19th, this function returns a UNIX timestamp for March 1st of the same year.

SELECT UTC_USEC_TO_MONTH(1274259481071200);

Returns: 1272672000000000

UTC_USEC_TO_WEEK(<unix_timestamp>,<day_of_week>)

Returns a UNIX timestamp in microseconds that represents a day in the week of the unix_timestamp argument. This function takes two arguments: a UNIX timestamp in microseconds, and a day of the week from 0 (Sunday) to 6 (Saturday).

For example, if unix_timestamp occurs on Friday, 2008-04-11, and you set day_of_week to 2 (Tuesday), the function returns a UNIX timestamp for Tuesday, 2008-04-08.

SELECT UTC_USEC_TO_WEEK(1207929480000000, 2) AS tuesday;

Returns: 1207612800000000

UTC_USEC_TO_YEAR(<unix_timestamp>)

Returns a UNIX timestamp in microseconds that represents the year of the unix_timestamp argument.

For example, if unix_timestamp occurs in 2010, the function returns 1274259481071200, the microsecond representation of 2010-01-01 00:00.

SELECT UTC_USEC_TO_YEAR(1274259481071200);

Returns: 1262304000000000

YEAR(<timestamp>)

Returns the year of a TIMESTAMP data type.

SELECT YEAR(TIMESTAMP('2012-10-02 05:23:48'));

Returns: 2012

Advanced examples

Scenario

Description

Example

Convert integer timestamp results into human-readable format

The following query finds the top 5 moments in time in which the most Wikipedia revisions took place. In order to display results in a human-readable
format, use BigQuery's FORMAT_UTC_USEC() function, which takes a timestamp, in microseconds, as an input. This query multiplies the Wikipedia POSIX format timestamps (in seconds) by 1000000 to convert the value into microseconds.

SELECT
/* Multiply timestamp by 1000000 and convert */
/* into a more human-readable format. */
TOP (FORMAT_UTC_USEC(timestamp * 1000000), 5)
AS top_revision_time,
COUNT (*) AS revision_count
FROM
[publicdata:samples.wikipedia];

It's useful to use date and time functions to group query results into buckets corresponding to particular years, months, or days. The following example uses the UTC_USEC_TO_MONTH() function to display how many characters each Wikipedia contributor uses in their revision comments per month.

SELECT
contributor_username,
/* Return the timestamp shifted to the
* start of the month, formatted in
* a human-readable format. Uses the
* 'LEFT()' string function to return only
* the first 7 characters of the formatted timestamp.
*/
LEFT (FORMAT_UTC_USEC(
UTC_USEC_TO_MONTH(timestamp * 1000000)),7)
AS month,
SUM(LENGTH(comment)) as total_chars_used
FROM
[publicdata:samples.wikipedia]
WHERE
(contributor_username != '' AND
contributor_username IS NOT NULL)
AND timestamp > 1133395200
AND timestamp < 1157068800
GROUP BY
contributor_username, month
ORDER BY
total_chars_used DESC;

Converts a string representing IPv4 address to unsigned integer value. For example, PARSE_IP('0.0.0.1') will return 1. If string is not a valid IPv4 address, PARSE_IP will return NULL.

IPAddress supports writing IPv4 and IPv6 addresses in packed strings, as 4- or 16-byte binary data in network byte order. The functions described below supports parsing the addresses to and from human readable form. These functions work only on string fields with IPs.

Function syntax

Description

FORMAT_PACKED_IP(packed_ip)

Returns a human-readable IP address, in the form
10.1.5.23 or 2620:0:1009:1:216:36ff:feef:3f. Examples:

JSON functions

BigQuery's JSON functions give you the ability to find values within your stored JSON data, by using JSONPath-like expressions.

Storing JSON data can be more flexible than declaring all of your individual fields in your table schema, but can lead to higher costs. When you select data from a JSON string, you are charged for scanning the entire string, which is more expensive than if each field is in a separate column. The query is also slower since the entire string needs to be parsed at query time. But for ad-hoc or rapidly-changing schemas, the flexibility of JSON can be worth the extra cost.

Mathematical functions

Mathematical functions take numeric arguments and return a numeric result. Each argument can be a numeric literal or a numeric value returned by a query. If the mathematical function evaluates to an undefined result, the operation returns NULL.

Syntax

Function

Description

ABS(numeric_expr)

Returns the absolute value of the argument.

ACOS(numeric_expr)

Returns the arc cosine of the argument.

ACOSH(numeric_expr)

Returns the arc hyperbolic cosine of the argument.

ASIN(numeric_expr)

Returns the arc sine of the argument.

ASINH(numeric_expr)

Returns the arc hyperbolic sine of the argument.

ATAN(numeric_expr)

Returns the arc tangent of the argument.

ATANH(numeric_expr)

Returns the arc hyperbolic tangent of the argument.

ATAN2(numeric_expr1, numeric_expr2)

Returns the arc tangent of the two arguments.

CEIL(numeric_expr)

Rounds the argument up to the nearest whole number and returns the rounded value.

COS(numeric_expr)

Returns the cosine of the argument.

COSH(numeric_expr)

Returns the hyperbolic cosine of the argument.

DEGREES(numeric_expr)

Returns numeric_expr, converted from radians to degrees.

EXP(numeric_expr)

Returns the result of raising the constant "e" - the base of the natural logarithm - to the power of numeric_expr.

FLOOR(numeric_expr)

Rounds the argument down to the nearest whole number and returns the rounded value.

LN(numeric_expr)LOG(numeric_expr)

Returns the natural logarithm of the argument.

LOG2(numeric_expr)

Returns the Base-2 logarithm of the argument.

LOG10(numeric_expr)

Returns the Base-10 logarithm of the argument.

PI()

Returns the constant π. The PI() function requires parentheses to signify that it is a function, but takes no arguments in those parentheses. You can use PI() like a constant with mathematical and arithmetic functions.

POW(numeric_expr1, numeric_expr2)

Returns the result of raising numeric_expr1 to the power of numeric_expr2.

Returns a random float value in the range 0.0 <= value < 1.0. Each int32_seed value always generates the same sequence of random numbers within a given query, as long as you don't use a LIMIT clause. If int32_seed is not specified, BigQuery uses the current timestamp as the seed value.

ROUND(numeric_expr [, digits])

Rounds the argument either up or down to the nearest whole number (or if specified, to the specified number of digits) and returns the rounded value.

SIN(numeric_expr)

Returns the sine of the argument.

SINH(numeric_expr)

Returns the hyperbolic sine of the argument.

SQRT(numeric_expr)

Returns the square root of the expression.

TAN(numeric_expr)

Returns the tangent of the argument.

TANH(numeric_expr)

Returns the hyperbolic tangent of the argument.

Advanced examples

Scenario

Description

Example

Bounding box query

The following query returns a collection of points within a rectangular bounding box centered around San Francisco (37.46, -122.50).

Return a collection of up to 100 points within an approximated circle determined by the using the Spherical Law of Cosines, centered around Denver Colorado (39.73, -104.98). This query makes use of BigQuery's mathematical and
trigonometric functions, such as PI(), SIN(), and
COS().

Because the Earth isn't an absolute sphere, and longitude+latitude converges at the poles, this query returns an approximation that can be useful for many types of data.

Regular expression functions

BigQuery provides regular expression support using the re2 library; see that documentation for its regular expression syntax.

Note that the regular expressions are global matches; to start matching at the beginning of a word you must use the ^ character.

Syntax

Function

Description

Example

REGEXP_MATCH('str', 'reg_exp')

Returns true if str matches the regular expression. For string matching without regular expressions, use CONTAINS instead of REGEXP_MATCH.

SELECT
word,
COUNT(word) AS count
FROM
publicdata:samples.shakespeare
WHERE
(REGEXP_MATCH(word,r'\w\w\'\w\w'))
GROUP BY word
ORDER BY count DESC
LIMIT 3;

Returns:

word

count

ne'er

42

we'll

35

We'll

33

REGEXP_EXTRACT('str', 'reg_exp')

Returns the portion of str that matches the capturing group within the regular expression.

SELECT
REGEXP_EXTRACT(word,r'(\w\w\'\w\w)') AS fragment
FROM
publicdata:samples.shakespeare
GROUP BY fragment
ORDER BY fragment
LIMIT 3;

Returns:

fragment

null

Al'ce

As'es

REGEXP_REPLACE('orig_str', 'reg_exp', 'replace_str')

Returns a string where any substring of orig_str that matches reg_exp is replaced with replace_str. For example, REGEXP_REPLACE ('Hello', 'lo', 'p') returns Help.

Advanced examples

Scenario

Description

Example

Filter result set by regular expression match

BigQuery's regular expression functions can be used to filter results in a WHERE clause, as well as to display results in the SELECT. The following example combines both of these regular expression use cases into a single query.

SELECT
/* Replace underscores in the title with spaces */
REGEXP_REPLACE(title, r'_', ' ') AS regexp_title, views
FROM
(SELECT title, SUM(views) as views
FROM [bigquery-samples:wikimedia_pageviews.201201]
WHERE
NOT title CONTAINS ':'
AND wikimedia_project='wp'
AND language='en'
/* Match titles that start with 'G', */
/* end with 'e', and contain two 'o's */
AND REGEXP_MATCH(title, r'^G.*o.*o.*e$')
GROUP BY
title
ORDER BY
views DESC
LIMIT 100)

Using regular expressions on integer or float data

While BigQuery's regular expression functions only work for string data, it's possible to use the STRING() function to cast integer or float data into string format. In this example, STRING() is used to cast the integer value corpus_date to a string, which is then altered by REGEXP_REPLACE.

SELECT
corpus_date,
/* Cast the corpus_date to a string value */
REGEXP_REPLACE(STRING(corpus_date),
'^16',
'Written in the sixteen hundreds, in the year \''
) AS date_string
FROM [publicdata:samples.shakespeare]
/* Cast the corpus_date to string, */
/* match values that begin with '16' */
WHERE
REGEXP_MATCH(STRING(corpus_date), '^16')
GROUP BY
corpus_date, date_string
ORDER BY
date_string DESC
LIMIT 5;

String functions

String functions operate on string data. String constants must be enclosed with single or double quotes. String functions are case-sensitive by default and should use LATIN-1 encoding only (use UTF-8 encoding if necessary). You can append IGNORE CASE to the end of a query to enable case-insensitive matching. IGNORE CASE works only for LATIN-1 strings.

Syntax

Returns the concatenation of two or more strings, or NULL if any of the values are NULL. Example: if str1 is Java and str2 is Script, CONCAT returns JavaScript.

expr CONTAINS 'str'

Returns true if expr contains the specified string argument. This is a case-sensitive comparison.

INSTR('str1', 'str2')

Returns the one-based index of the first occurrence of str2 in str1, or returns 0 if str2 does not occur in str1.

LEFT('str', numeric_expr)

Returns the leftmost numeric_expr characters of str. If the number is longer than str, the full string will be returned. Example:LEFT('seattle', 3) returns sea.

LENGTH('str')

Returns a numerical value for the length of the string. Example: if str is '123456', LENGTH returns 6.

LOWER('str')

Returns the original string with all characters in lower case. Works for LATIN-1 characters only.

LPAD('str1', numeric_expr, 'str2')

Pads str1 on the left with str2, repeating str2 until the result string is exactly numeric_expr characters. Example:LPAD('1', 7, '?') returns ??????1.

LTRIM('str1' [, str2])

Removes characters from the left side of str1. If str2 is omitted, LTRIM removes spaces from the left side of str1. Otherwise, LTRIM removes any characters in str2 from the left side of str1 (case-sensitive).

Examples:

SELECT LTRIM("Say hello", "yaS") returns " hello".

SELECT LTRIM("Say hello", " ySa") returns "hello".

REPLACE('str1', 'str2', 'str3')

Replaces all instances of str2 within str1 with str3.

RIGHT('str', numeric_expr)

Returns the rightmost numeric_expr characters of str. If the number is longer than the string, it will return the whole string. Example:RIGHT('kirkland', 4) returns land.

RPAD('str1', numeric_expr, 'str2')

Pads str1 on the right with str2, repeating str2 until the result string is exactly numeric_expr characters. Example:RPAD('1', 7, '?') returns 1??????.

RTRIM('str1' [, str2])

Removes trailing characters from the right side of str1. If str2 is omitted, RTRIM removes trailing spaces from str1. Otherwise, RTRIM removes any characters in str2 from the right side of str1 (case-sensitive).

Examples:

SELECT RTRIM("Say hello", "leo") returns "Say h".

SELECT RTRIM("Say hello ", " hloe") returns "Say".

SPLIT('str' [, 'delimiter'])

Returns a set of substrings as a repeated string. If delimiter is specified, the SPLIT function breaks str into substrings, using delimiter as the delimiter.

SUBSTR('str', index [, max_len])

Returns a substring of str, starting at index. If the optional max_len parameter is used, the returned string is a maximum of max_len characters long. Counting starts at 1, so the first character in the string is in position 1 (not zero). If index is 5, the substring begins with the 5th character from the left in str. If index is -4, the substring begins with the 4th character from the right in str. Example:SUBSTR('awesome', -4, 4) returns the substring some.

UPPER('str')

Returns the original string with all characters in upper case. Works for LATIN-1 characters only.

Escaping special characters in strings

To escape special characters, use one of the following methods:

Use'\xDD' notation, where '\x' is followed by the two-digit hex representation of the character.

Use an escaping slash in front of slashes, single quotes, and double quotes.

Table wildcard functions

Table wildcard functions are a cost-effective way to query data from a specific set of tables. When you use a table wildcard function, BigQuery only accesses and charges you for tables that match the wildcard. Table wildcard functions are specified in the query's FROM clause.

If you use table wildcard functions in a query, the functions must be contained in parentheses, as shown in the following examples.

Function

Description

Example

TABLE_DATE_RANGE(prefix, timestamp1, timestamp2)

Queries daily tables that overlap with the time range between <timestamp1> and <timestamp2>.

Table names must have the following format: <prefix><day>, where <day> is in the format YYYYMMDD.

SELECT
name
FROM
(TABLE_DATE_RANGE(mydata.people,
TIMESTAMP('2014-03-25'),
TIMESTAMP('2014-03-27')))
WHERE
age >= 35

Matches the following tables:

mydata.people20140325

mydata.people20140326

mydata.people20140327

Example: get tables in a two-day range up to "now"

This example assumes the following tables exist:

mydata.people20140323

mydata.people20140324

mydata.people20140325

SELECT
name
FROM
(TABLE_DATE_RANGE(mydata.people,
DATE_ADD(CURRENT_TIMESTAMP(), -2, 'DAY'),
CURRENT_TIMESTAMP()))
WHERE
age >= 35

Matches the following tables:

mydata.people20140323

mydata.people20140324

mydata.people20140325

TABLE_DATE_RANGE_STRICT(prefix, timestamp1, timestamp2)

This function is equivalent to TABLE_DATE_RANGE. The only difference is that if any daily table is missing in the sequence, TABLE_DATE_RANGE_STRICT fails and returns a Not Found: Table <table_name> error.

Example: error on missing table

This example assumes the following tables exist:

people20140325

people20140327

SELECT
name
FROM
(TABLE_DATE_RANGE_STRICT(people,
TIMESTAMP('2014-03-25'),
TIMESTAMP('2014-03-27')))
WHERE age >= 35

The above example returns an error "Not Found" for the table "people20130326".

TABLE_QUERY(dataset, expr)

Queries tables whose names match the supplied expr. The expr parameter must be represented as a string and must contain an expression to evaluate. For example, 'length(table_id) < 3'.

URL functions

Syntax

Given a URL, returns the host name as a string. Example: HOST('http://www.google.com:80/index.html') returns 'www.google.com'

DOMAIN('url_str')

Given a URL, returns the domain as a string. Example: DOMAIN('http://www.google.com:80/index.html') returns 'google.com'

TLD('url_str')

Given a URL, returns the top level domain plus any country domain in the URL. Example: TLD('http://www.google.com:80/index.html') returns '.com'. TLD('http://www.google.co.uk:80/index.html') returns '.co.uk'.

Advanced examples

Scenario

Description

Example

Parse domain names from URL data

This query uses the DOMAIN() function to return the most popular domains listed as repository homepages on GitHub. Note the use of HAVING to filter records using the result of the DOMAIN() function. This is a useful function to determine
referrer information from URL data.

SELECT
DOMAIN(repository_homepage) AS user_domain,
COUNT(*) AS activity_count
FROM
[publicdata:samples.github_timeline]
GROUP BY
user_domain
HAVING
user_domain IS NOT NULL AND user_domain != ''
ORDER BY
activity_count DESC
LIMIT 5;

To look specifically at TLD information, use the TLD() function. This
example displays the top TLDs that are not in a list of common examples.

SELECT
TLD(repository_homepage) AS user_tld,
COUNT(*) AS activity_count
FROM
[publicdata:samples.github_timeline]
GROUP BY
user_tld
HAVING
/* Only consider TLDs that are NOT NULL */
/* or in our list of common TLDs */
user_tld IS NOT NULL AND NOT user_tld
IN ('','.com','.net','.org','.info','.edu')
ORDER BY
activity_count DESC
LIMIT 5;

PARTITION BY is always optional. ORDER BY is optional in some cases, but certain window functions, such as rank() or dense_rank(), require the clause.

JOIN EACH and GROUP EACH BY clauses can't be used on the output of window functions. To generate large query results when using window functions, you must use PARTITION BY.

Syntax

Function

Description

Example

CUME_DIST()

Returns a double that indicates the cumulative distribution of a value in a group of values, calculated using the formula <number of rows preceding or tied with the current row> / <total rows>. Tied values return the same cumulative distribution value.

This window function requires ORDER BY in the OVER clause.

SELECT
word,
word_count,
CUME_DIST() OVER (PARTITION BY corpus ORDER BY word_count DESC) cume_dist,
FROM
[publicdata:samples.shakespeare]
WHERE
corpus='othello' and length(word) > 10
LIMIT 5

Returns:

word

word_count

cume_dist

handkerchief

29

0.2

satisfaction

5

0.4

displeasure

4

0.8

instruments

4

0.8

circumstance

3

1.0

DENSE_RANK()

Returns the integer rank of a value in a group of values. The rank is calculated based on comparisons with other values in the group.

Tied values display as the same rank. The rank of the next value is incremented by 1. For example, if two values tie for rank 2, the next ranked value is 3. If you prefer a gap in the ranking list, use rank().

This window function requires ORDER BY in the OVER clause.

SELECT
word,
word_count,
DENSE_RANK() OVER (PARTITION BY corpus ORDER BY word_count DESC) dense_rank,
FROM
[publicdata:samples.shakespeare]
WHERE
corpus='othello' and length(word) > 10
LIMIT 5

Returns:

word

word_count

dense_rank

handkerchief

29

1

satisfaction

5

2

displeasure

4

3

instruments

4

3

circumstance

3

4

LAG(<expr>[, <offset>[, <default_value>]])

Returns the value of <expr> for the row located <offset> rows before the current row. If the row doesn't exist, <default_value> returns.

Returns the value of <expr> at position <n> of the window frame, where <n> is a one-based index.

NTILE(<num_buckets>)

Divides a sequence of rows into <num_buckets> buckets and assigns a corresponding bucket number, as an integer, with each row. The ntile() function assigns the bucket numbers as equally as possible and returns a value from 1 to <num_buckets> for each row.

SELECT
word,
word_count,
NTILE(2) OVER (PARTITION BY corpus ORDER BY word_count DESC) ntile,
FROM
[publicdata:samples.shakespeare]
WHERE
corpus='othello' and length(word) > 10
LIMIT 5

Returns:

word

word_count

ntile

handkerchief

29

1

satisfaction

5

1

displeasure

4

1

instruments

4

2

circumstance

3

2

PERCENT_RANK()

Returns the rank of the current row, relative to the other rows in the partition. Returned values range between 0 and 1, inclusively. The first value returned is 0.0.

This window function requires ORDER BY in the OVER clause.

SELECT
word,
word_count,
PERCENT_RANK() OVER (PARTITION BY corpus ORDER BY word_count DESC) p_rank,
FROM
[publicdata:samples.shakespeare]
WHERE
corpus='othello' and length(word) > 10
LIMIT 5

Returns:

word

word_count

p_rank

handkerchief

29

0.0

satisfaction

5

0.25

displeasure

4

0.5

instruments

4

0.5

circumstance

3

1.0

PERCENTILE_CONT(<percentile>)

Returns values that are based upon linear interpolation between the values of the group, after ordering them per the ORDER BY clause.

<percentile> must be between 0 and 1.

This window function requires ORDER BY in the OVER clause.

SELECT
word,
word_count,
PERCENTILE_CONT(0.5) OVER (PARTITION BY corpus ORDER BY word_count DESC) p_cont,
FROM
[publicdata:samples.shakespeare]
WHERE
corpus='othello' and length(word) > 10
LIMIT 5

Returns:

word

word_count

p_cont

handkerchief

29

4

satisfaction

5

4

displeasure

4

4

instruments

4

4

circumstance

3

4

PERCENTILE_DISC(<percentile>)

Returns the value with the smallest cumulative distribution that's greater or equal to <percentile>.

<percentile> must be between 0 and 1.

This window function requires ORDER BY in the OVER clause.

SELECT
word,
word_count,
PERCENTILE_DISC(0.5) OVER (PARTITION BY corpus ORDER BY word_count DESC) p_disc,
FROM
[publicdata:samples.shakespeare]
WHERE
corpus='othello' and length(word) > 10
LIMIT 5

Returns:

word

word_count

p_disc

handkerchief

29

4

satisfaction

5

4

displeasure

4

4

instruments

4

4

circumstance

3

4

RANK()

Returns the integer rank of a value in a group of values. The rank is calculated based on comparisons with other values in the group.

Tied values display as the same rank. The rank of the next value is incremented according to how many tied values occurred before it. For example, if two values tie for rank 2, the next ranked value is 4, not 3. If you prefer no gaps in the ranking list, use dense_rank().

This window function requires ORDER BY in the OVER clause.

SELECT
word,
word_count,
RANK() OVER (PARTITION BY corpus ORDER BY word_count DESC) rank,
FROM
[publicdata:samples.shakespeare]
WHERE
corpus='othello' and length(word) > 10
LIMIT 5

Returns:

word

word_count

rank

handkerchief

29

1

satisfaction

5

2

displeasure

4

3

instruments

4

3

circumstance

3

5

RATIO_TO_REPORT(<column>)

Returns the ratio of each value to the sum of the values, as a double between 0 and 1.

SELECT
word,
word_count,
RATIO_TO_REPORT(word_count) OVER (PARTITION BY corpus ORDER BY word_count DESC) r_to_r,
FROM
[publicdata:samples.shakespeare]
WHERE
corpus='othello' and length(word) > 10
LIMIT 5

Returns:

word

word_count

r_to_r

handkerchief

29

0.6444444444444445

satisfaction

5

0.1111111111111111

displeasure

4

0.08888888888888889

instruments

4

0.08888888888888889

circumstance

3

0.06666666666666667

ROW_NUMBER()

Returns the current row number of the query result, starting with 1.

SELECT
word,
word_count,
ROW_NUMBER() OVER (PARTITION BY corpus ORDER BY word_count DESC) row_num,
FROM
[publicdata:samples.shakespeare]
WHERE
corpus='othello' and length(word) > 10
LIMIT 5

Other functions

Syntax

CASE WHEN when_expr1 THEN then_expr1
WHEN when_expr2 THEN then_expr2 ...
ELSE else_expr END

Use CASE to choose among two or more alternate expressions in your query. WHEN expressions must be boolean, and all the expressions in THEN clauses and ELSE clause must be compatible types.

CURRENT_USER()

Returns the email address of the user running the query.

EVERY(<condition>)

Returns true if condition is true for all of its inputs. When used with the OMIT IF clause, this function is useful for queries that involve repeated fields.

HASH(expr)

Computes and returns a 64-bit signed hash value of the bytes of expr as defined by the CityHash library (version 1.0.3.). Any string or integer expression is supported and the function respects IGNORE CASE for strings, returning case invariant values.

IF(condition, true_return, false_return)

Returns either true_return or false_return, depending on whether condition is true or false. The return values can be literals or field-derived values, but they must be the same data type. Field-derived values do not need to be included in the SELECT clause.

OMIT <RECORD|field> IF <condition>

In contrast to the WHERE clause, elements are excluded if they satisfy the condition. More important, whereas the WHERE clause filters only the entire top-level record, the OMIT IF clause can exclude an individual element in a repeated field, and its condition can include aggregate functions of fields that appear below the element being omitted. All repeated fields referenced that are scoped below the filtered field must appear within an aggregate function.

POSITION(field)

Returns the one-based, sequential position of field within a set of repeated fields.

SOME(<condition>)

Returns true if condition is true for at least one of its inputs. When used with the OMIT IF clause, this function is useful for queries that involve repeated fields.

Advanced examples

Scenario

Description

Example

Bucketing results into categories using conditionals

The following query uses a CASE/WHEN block to bucket results into "region" categories based on a list of states. If the state does not appear as an option in one of the WHEN statements, the state value will default to "None."

Use conditional statements to organize the results of a subselect query into rows and columns. In the example below, results from a search for most revised Wikipedia articles that start with the value 'Google' are organized into columns where the revision counts are displayed if they meet various criterea.

Some queries can provide a useful result using random subsampling of the result set. To retrieve a random sampling of values, use the HASH function to return results in which the modulo "n" of the hash equals zero.

For example, the following query will find the HASH() of the "title" value, and then checks if that value modulo "2" is zero. This should result in about 50% of the values being labeled as "sampled." To sample fewer values, increase the value of the modulo operation from "2" to something larger. The query uses the ABS function in combination with HASH, because HASH can return negative values, and the modulo operator on a negative value yields a negative value.