Description

Y = discretize(X,edges) returns
the indices of the bins that contain the elements of X.
The jth bin contains element X(i) if edges(j)
<= X(i) < edges(j+1) for 1 <= j < N,
where N is the number of bins and length(edges)
= N+1. The last bin contains both edges such that edges(N)
<= X(i) <= edges(N+1).

[Y,E] =
discretize(X,dur),
where X is a datetime or duration array, divides X into
uniform bins of dur length of time. dur can
be a scalar duration or calendarDuration,
or a unit of time. For example, [Y,E] = discretize(X,'hour') divides X into
bins with a uniform duration of 1 hour.

[___] = discretize(___,values) returns
the corresponding element in values rather than
the bin number, using any of the previous input or output argument
combinations. For example, if X(1) is in bin 5,
then Y(1) is values(5) rather
than 5. values must be a vector
with length equal to the number of bins.

[___] = discretize(___,'categorical') creates
a categorical array where each bin is a category. In most cases, the
default category names are of the form “[A,B)”
(or “[A,B]” for the last bin), where A and B are
consecutive bin edges. If you specify dur as a
character vector, then the default category names might have special
formats. See Y for a listing of the display formats.

[___] = discretize(___,'categorical',categoryNames) also
names the categories in Y using the cell array
of character vectors, categoryNames. The length
of categoryNames must be equal to the number of
bins.

[___] = discretize(___,'IncludedEdge',side),
where side is 'left' or
'right', specifies whether each bin includes its right or
left bin edge. For example, if side is
'right', then each bin includes the right bin edge,
except for the first bin which includes both edges. In this
case, the jth bin contains an element X(i)
if edges(j) < X(i) <= edges(j+1), where 1 <
j <= N and N is the number of bins. The
first bin includes the left edge such that it contains edges(1) <=
X(i) <= edges(2). The default for side is
'left'.

edges — Bin edgesnumeric vector

Bin edges, specified as a monotonically increasing numeric vector.
Consecutive elements in edges form discrete bins,
which discretize uses to partition the data in X.
By default, each bin includes the left bin edge, except for the last
bin, which includes both bin edges.

edges must have at least two elements, since edges(1) is
the left edge of the first bin and edges(end) is
the right edge of the last bin.

N — Number of binsscalar integer

Number of bins, specified as a scalar integer. discretize divides the
range of the data into N uniform bins. If the data is
unevenly distributed, then some of the intermediate bins can be empty.
However, the first and last bin always include at least one piece of
data.

Example: [Y,E] = discretize(X,5) distributes the data in
X into 5 bins with a uniform width.

Uniform bin duration, specified as a scalar duration or calendarDuration,
or as one of the values in the table.

If you specify dur, then
discretize can use a maximum of 65,536 bins (or
216). If the specified bin duration requires
more bins, then discretize uses a larger bin width
corresponding to the maximum number of bins.

Value

Works
with...

Description

'second'

Datetime or duration values

Each bin is 1 second.

'minute'

Datetime or duration values

Each bin is 1 minute.

'hour'

Datetime or duration values

Each bin is 1 hour.

'day'

Datetime or duration values

For datetime inputs, each bin is 1 calendar day. This
value accounts for Daylight Saving Time shifts.

For duration inputs, each bin is 1 fixed-length day
(24 hours).

'week'

Datetime values

Each bin is 1 calendar week.

'month'

Datetime values

Each bin is 1 calendar month.

'quarter'

Datetime values

Each bin is 1 calendar quarter.

'year'

Datetime or duration values

For datetime inputs, each bin is 1 calendar year.
This value accounts for leap days.

values — Bin valuesvector

Bin values, specified as a vector of any data type. values must
have the same length as the number of bins, length(edges)-1.
The elements in values replace the normal bin index
in the output. That is, if X(1) falls into bin 2,
then discretize returns Y(1) as values(2) rather
than 2.

If values is a cell array, then all the input
data must belong to a bin.

displayFormat — Datetime and duration display formatcharacter vector

Datetime and duration display format, specified as a character
vector. The displayFormat value does not change
the values in Y, only their display. You can specify displayFormat using
any valid display format for datetime and duration arrays. For more
information about the available options, see Set Date and Time Display Format.

Example: discretize(X,'day','categorical','h') specifies
a display format for a duration array.

Example: discretize(X,'day','categorical','yyyy-MM-dd') specifies
a display format for a datetime array.

Output Arguments

Bins, returned as a numeric vector, matrix, multidimensional
array, or ordinal categorical array. Y is the same
size as X, and each element describes the bin placement
for the corresponding element in X. If values is
specified, then the data type of Y is the same
as values. Out-of-range elements are expressed
differently depending on the data type of the output:

If Y is a categorical array, then
it contains undefined elements for out-of-range or NaN inputs.

If values is a vector of an integer
data type, then Y contains 0 for
out-of-range or NaN inputs.

The default category name formats in Y for
the syntax discretize(X,dur,'categorical') are:

Value of dur

Default Category Name Format

Format Example

'second'

global default format

28-Jan-2016 10:32:06

'minute'

'hour'

'day'

global default date format

28-Jan-2016

'week'

[global_default_date_format,
global_default_date_format)

[24-Jan-2016, 30-Jan-2016)

'month'

'MMM-uuuu'

Jun-2016

'quarter'

'QQQ uuuu'

Q4 2015

'year'

'uuuu'

2016

'decade'

'[uuuu, uuuu)'

[2010, 2020)

'century'

E — Bin edgesvector

Bin edges, returned as a vector. Specify this output to see
the bin edges that discretize calculates in cases
where you do not explicitly pass in the bin edges.

Tips

The behavior of discretize is
similar to that of the histcounts function. Use histcounts to
find the number of elements in each bin. On the other hand, use discretize to
find which bin each element belongs to (without counting).

Extended Capabilities

Tall ArraysCalculate with arrays that have more rows than fit in memory.

This function fully supports tall arrays. For
more information, see Tall Arrays.