1. Introduction to Window Functions

A window function is an SQL function where the input
values are taken from
a "window" of one or more rows in the results set of a SELECT statement.

Window functions are distinguished from other SQL functions by the
presence of an OVER clause. If a function has an OVER clause,
then it is a window function. If it lacks an OVER clause, then it is an
ordinary aggregate or scalar function. Window functions might also
have a FILTER clause in between the function and the OVER clause.

Unlike ordinary functions, window functions
cannot use the DISTINCT keyword.
Also, Window functions may only appear in the result set and in the
ORDER BY clause of a SELECT statement.

Window functions come in two varieties:
aggregate window functions and
built-in window functions. Every aggregate window function
can also work as a ordinary aggregate function, simply by omitting
the OVER and FILTER clauses. Furthermore, all of the built-in
aggregate functions of SQLite can be used as an
aggregate window function by adding an appropriate OVER clause.
Applications can register new aggregate window functions using
the sqlite3_create_window_function() interface.
The built-in window functions, however, require special-case
handling in the query planner and hence new window functions
that exhibit the exceptional properties found in the built-in
window functions cannot be added by the application.

The row_number() window function
assigns consecutive integers to each
row in order of the "ORDER BY" clause within the
window-defn (in this case "ORDER BY y"). Note that
this does not affect the order in which results are returned from
the overall query. The order of the final output is
still governed by the ORDER BY clause attached to the SELECT
statement (in this case "ORDER BY x").

Named window-defn clauses may also be added to a SELECT
statement using a WINDOW clause and then referred to by name within window
function invocations. For example, the following SELECT statement contains
two named window-defs clauses, "win1" and "win2":

SELECT x, y, row_number() OVER win1, rank() OVER win2
FROM t0
WINDOW win1 AS (ORDER BY y RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW),
win2 AS (PARTITION BY y ORDER BY x)
ORDER BY x;

The WINDOW clause, when one is present, comes after any HAVING clause and
before any ORDER BY.

2. Aggregate Window Functions

The examples in this section all assume that the database is populated as
follows:

An aggregate window function is similar to an
ordinary aggregate function, except
adding it to a query does not change the number of rows returned. Instead,
for each row the result of the aggregate window function is as if the
corresponding aggregate were run over all rows in the "window frame"
specified by the OVER clause.

In the example above, the window frame consists of all rows between the
previous row ("1 PRECEDING") and the following row ("1 FOLLOWING"), inclusive,
where rows are sorted according to the ORDER BY clause in the
window-defn (in this case "ORDER BY a").
For example, the frame for the row with (a=3) consists of rows (2, 'B', 'two'),
(3, 'C', 'three') and (4, 'D', 'one'). The result of group_concat(b, '.')
for that row is therefore 'B.C.D'.

2.1. The PARTITION BY Clause

For the purpose of computing window functions, the result set
of a query is divided into one or more "partitions". A partition consists
of all rows that have the same value for all terms of the PARTITION BY clause
in the window-defn. If there is no PARTITION BY clause,
then the entire result set of the query is a single partition.
Window-function processing is performed separately for each partition.

In the query above, the "PARTITION BY c" clause breaks the
result set up into three partitions. The first partition has
three rows with c=='one'. The second partition has two rows with
c=='three' and the third partition has two rows with c=='two'.

In the example above, all the rows for each partition are
grouped together in the final output. This is because the PARTITION BY
clause is a prefix of the ORDER BY clause on the overall query.
But that does not have
to be the case. A partition can be composed of rows scattered
about haphazardly within the result set. For example:

The ending frame boundary can be omitted (if the
BETWEEN and AND keywords that surround the starting frame boundary
are also omitted),
in which case the ending frame boundary defaults to CURRENT ROW.

If the frame type is RANGE or GROUPS, then rows with the same values for
all ORDER BY expressions are considered "peers". Or, if there are no ORDER BY
terms, all rows are peers. Peers are always within the same frame.

The default frame-spec is:

RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW EXCLUDE NO OTHERS

The default means that aggregate window functions read all
rows from the beginning of the partition up to and including the
current row and its peers. This implies that rows that have the same values for
all ORDER BY expressions will also have the same value for the result of the
window function (as the window frame is the same). For example:

2.2.1. Frame Type

There are three frame types: ROWS, GROUPS, and RANGE.
The frame type determines how the starting and ending boundaries
of the frame are measured.

ROWS:
The ROWS frame type means that the starting and ending boundaries
for the frame are determined by counting individual rows relative
to the current row.

GROUPS:
The GROUPS frame type means that the starting and ending boundaries
are determine by counting "groups" relative to the current group.
A "group" is a set of rows that all have equivalent values for all
all terms of the window ORDER BY clause. ("Equivalent" means that
the IS operator is true when comparing the two values.)
In other words, a group consists of all peers of a row.

RANGE:
The RANGE frame type requires that the ORDER BY clause of the
window have exactly one term. Call that term "X". With the
RANGE frame type, the elements of the frame are determined by
computing the value of expression X for all rows in the partition
and framing those rows for which the value of X is within a certain
range of the value of X for the current row. See the description
in the "<expr> PRECEDING" boundary
specification below for details.

The ROWS and GROUPS frame types are similar in that they
both determine the extent of a frame by counting relative to
the current row. The difference is that ROWS counts individual
rows and GROUPS counts peer groups.
The RANGE frame type is different.
The RANGE frame type determines the extent of a frame by
looking for expression values that are within some band of
values relative to the current row.

2.2.2. Frame Boundaries

There are five ways to describe starting and ending frame boundaries:

UNBOUNDED PRECEDING
The frame boundary is the first
row in the partition.

<expr> PRECEDING
<expr> must be a non-negative constant numeric expression.
The boundary is a row that is <expr> "units" prior to
the current row. The meaning of "units" here depends on the
frame type:

ROWS →
The frame boundary is the row that is <expr>
rows before the current row, or the first row of the
partition if there are fewer than <expr> rows
before the current row. <expr> must be an integer.

GROUPS →
A "group" is a set of peer rows - rows that all have
the same values for every term in the ORDER BY clause.
The frame boundary is the group that is <expr>
groups before the group containing the current row, or the
first group of the partition if there are fewer
than <expr> groups before the current row.
For the starting boundary of a frame, the first
row of the group is used and for the ending boundary
of a frame, the last row of the group is used.
<expr> must be an integer.

RANGE →
For this form, the ORDER BY clause of the
window-defn must have a single
term. Call that ORDER BY term "X". Let
Xi be the value of the X
expression for the i-th row in the partition and let
Xc be the value of X for the
current row. Informally, a RANGE bound is the first row
for which Xi is within
the <expr> of Xc.
More precisely:

If either Xi or
Xc are non-numeric, then
the boundary is the first row for which the expression
"Xi IS Xc"
is true.

Else if the ORDER BY is ASC then the boundary
is the first row for which
Xi>=Xc-<expr>.

Else if the ORDER BY is DESC then the boundary
is the first row for which
Xi<=Xc-<expr>.

For this form, the <expr> does not have to be an
integer. It can evaluate to a real number as long as
it is constant and non-negative.

The boundary description "0 PRECEDING" always means the same
thing as "CURRENT ROW".

CURRENT ROW
The current row. For RANGE and GROUPS frame types,
peers of the current row are also included in the frame,
unless specifically excluded by the EXCLUDE clause.
This is true regardless of whether CURRENT ROW is used
as the starting or ending frame boundary.

<expr> FOLLOWING
This is the same as "<expr> PRECEDING" except that
the boundary is <expr> units after the current
rather than before the current row.

UNBOUNDED FOLLOWING
The frame boundary is the last
row in the partition.

The ending frame boundary may not take a form that appears higher in
the above list than the starting frame boundary.

In the following example, the window frame for each row consists of all
rows from the current row to the end of the set, where rows are sorted
according to "ORDER BY a".

2.2.3. The EXCLUDE Clause

EXCLUDE NO OTHERS: This is the default. In this case no
rows are excluded from the window frame as defined by its starting and ending
frame boundaries.

EXCLUDE CURRENT ROW: In this case the current row is
excluded from the window frame. Peers of the current row remain in
the frame for the GROUPS and RANGE frame types.

EXCLUDE GROUP: In this case the current row and all other
rows that are peers of the current row are excluded from the frame. When
processing an EXCLUDE clause, all rows with the same ORDER BY values, or all
rows in the partition if there is no ORDER BY clause, are considered peers,
even if the frame type is ROWS.

EXCLUDE TIES: In this case the current row is part of the
frame, but peers of the current row are excluded.

The following example demonstrates the effect of the various
forms of the EXCLUDE clause:

If a FILTER clause is provided, then only rows for which the expr is
true are included in the window frame. The aggregate window still returns a
value for every row, but those for which the FILTER expression evaluates to
other than true are not included in the window frame for any row. For example:

3. Built-in Window Functions

Built-in window functions honor any PARTITION BY clause in the same way
as aggregate window functions - each selected row is assigned to a partition
and each partition is processed separately. The ways in which any ORDER BY
clause affects each built-in window function is described below. Some of
the window functions (rank(), dense_rank(), percent_rank() and ntile()) use
the concept of "peer groups" (rows within the same partition that have the
same values for all ORDER BY expressions). In these cases, it does not matter
whether the frame-spec specifies ROWS, GROUPS, or RANGE.
For the purposes of built-in window function processing, rows with the same values
for all ORDER BY expressions are considered peers regardless of the frame type.

Most built-in window functions ignore the
frame-spec, the exceptions being first_value(),
last_value() and nth_value(). It is a syntax error to specify a FILTER
clause as part of a built-in window function invocation.

SQLite supports the following 11 built-in window functions:

row_number()

The number of the row within the current partition. Rows are
numbered starting from 1 in the order defined by the ORDER BY clause in
the window definition, or in arbitrary order otherwise.

rank()

The row_number() of the first peer in each group - the rank of the
current row with gaps. If there is no ORDER BY clause, then all rows
are considered peers and this function always returns 1.

dense_rank()

The number of the current row's peer group within its partition - the
rank of the current row without gaps. Partitions are numbered starting
from 1 in the order defined by the ORDER BY clause in the window
definition. If there is no ORDER BY clause, then all rows are
considered peers and this function always returns 1.

percent_rank()

Despite the name, this function always returns a value between 0.0
and 1.0 equal to (rank - 1)/(partition-rows - 1), where
rank is the value returned by built-in window function rank()
and partition-rows is the total number of rows in the
partition. If the partition contains only one row, this function
returns 0.0.

cume_dist()

The cumulative distribution. Calculated as
row-number/partition-rows, where row-number is
the value returned by row_number() for the last peer in the group
and partition-rows the number of rows in the partition.

ntile(N)

Argument N is handled as an integer. This function divides the
partition into N groups as evenly as possible and assigns an integer
between 1 and N to each group, in the order defined by the ORDER
BY clause, or in arbitrary order otherwise. If necessary, larger groups
occur first. This function returns the integer value assigned to the
group that the current row is a part of.

lag(expr)lag(expr, offset)lag(expr, offset, default)

The first form of the lag() function returns the result of evaluating
expression expr against the previous row in the partition. Or, if
there is no previous row (because the current row is the first), NULL.

If the offset argument is provided, then it must be a
non-negative integer. In this case the value returned is the result
of evaluating expr against the row offset rows before the
current row within the partition. If offset is 0, then
expr is evaluated against the current row. If there is no row
offset rows before the current row, NULL is returned.

If default is also provided, then it is returned instead of
NULL if the row identified by offset does not exist.

lead(expr)lead(expr, offset)lead(expr, offset, default)

The first form of the lead() function returns the result of evaluating
expression expr against the next row in the partition. Or, if
there is no next row (because the current row is the last), NULL.

If the offset argument is provided, then it must be a
non-negative integer. In this case the value returned is the result
of evaluating expr against the row offset rows after the
current row within the partition. If offset is 0, then
expr is evaluated against the current row. If there is no row
offset rows after the current row, NULL is returned.

If default is also provided, then it is returned instead of
NULL if the row identified by offset does not exist.

first_value(expr)

This built-in window function calculates the window frame for each
row in the same way as an aggregate window function. It returns the
value of expr evaluated against the first row in the window frame
for each row.

last_value(expr)

This built-in window function calculates the window frame for each
row in the same way as an aggregate window function. It returns the
value of expr evaluated against the last row in the window frame
for each row.

nth_value(expr, N)

This built-in window function calculates the window frame for each
row in the same way as an aggregate window function. It returns the
value of expr evaluated against the row N of the window
frame. Rows are numbered within the window frame starting from 1 in
the order defined by the ORDER BY clause if one is present, or in
arbitrary order otherwise. If there is no Nth row in the
partition, then NULL is returned.

The example below uses ntile() to divide the six rows into two groups (the
ntile(2) call) and into four groups (the ntile(4) call). For ntile(2), there
are three rows assigned to each group. For ntile(4), there are two groups of
two and two groups of one. The larger groups of two appear first.

The next example demonstrates lag(), lead(), first_value(), last_value()
and nth_value(). The frame-spec is ignored by
both lag() and lead(), but respected by first_value(), last_value()
and nth_value().

4. Window Chaining

Window chaining is a shorthand that allows one window to be defined in terms
of another. Specifically, the shorthand allows the new window to implicitly
copy the PARTITION BY and optionally ORDER BY clauses of the base window. For
example, in the following:

SELECT group_concat(b, '.') OVER (
win ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
)
FROM t1
WINDOW win AS (PARTITION BY a ORDER BY c)

the window used by the group_concat() function is equivalent
to "PARTITION BY a ORDER BY c ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW". In order to use window
chaining, all of the following must be true:

The new window definition must not include a PARTITION BY clause. The
PARTITION BY clause, if there is one, must be supplied by the base
window specification.

If the base window has an ORDER BY clause, it is copied into the new
window. In this case the new window must not specify an ORDER BY clause.
If the base window has no ORDER BY clause, one may be specified as part
of the new window definition.

The base window may not specify a frame specification. The frame
specification can only be given in the new window specification.

The two fragments of SQL below are similar, but not entirely equivalent, as
the latter will fail if the definition of window "win" contains a frame
specification.

5. User-Defined Aggregate Window Functions

User-defined aggregate window functions may be created using the
sqlite3_create_window_function() API. Implementing an aggregate window
function is very similar to an ordinary aggregate function. Any user-defined
aggregate window function may also be used as an ordinary aggregate. To
implement a user-defined aggregate window function the application must
supply four callback functions:

Callback

Description

xStep

This method is required by both window aggregate and legacy aggregate
function implementations. It is invoked to add a row to the current
window. The function arguments, if any, corresponding to the row being
added are passed to the implementation of xStep.

xFinal

This method is required by both window aggregate and legacy aggregate
function implementations. It is invoked to return the current value
of the aggregate (determined by the contents of the current window),
and to free any resources allocated by earlier calls to xStep.

xValue

This method is only required window aggregate functions, not legacy
aggregate function implementations. It is invoked to return the current
value of the aggregate. Unlike xFinal, the implementation should not
delete any context.

xInverse

This method is only required window aggregate functions, not legacy
aggregate function implementations. It is invoked to remove a row
from the current window. The function arguments, if any, correspond
to the row being removed.

The C code below implements a simple window aggregate function named
sumint(). This works in the same way as the built-in sum() function, except
that it throws an exception if passed an argument that is not an integer
value.

The following example uses the sumint() function implemented by the above
C code. For each row, the window consists of the preceding row (if any), the current row and the following row (again, if any):

6. History

Window function support was first added to SQLite with release
version 3.25.0 (2018-09-15). The SQLite developers used
the PostgreSQL window function
documentation as their primary reference for how window functions
ought to behave. Many test cases have been run against PostgreSQL
to ensure that window functions operate the same way in both
SQLite and PostgreSQL.

In SQLite version 3.28.0 (2019-04-16),
windows function support was extended to include the EXCLUDE clause,
GROUPS frame types, window chaining, and support for
"<expr> PRECEDING" and "<expr> FOLLOWING" boundaries
in RANGE frames.