bindata

Syntax

Description

bdata = bindata(sc)
binned predictor variables returned as a table. This is a table of the same size
as the data input, but only the predictors specified in the
creditscorecard object's PredictorVars
property are binned and the remaining ones are unchanged.

bdata = bindata(sc,data)
returns a table of binned predictor variables. bindata
returns a table of the same size as the creditscorecard data,
but only the predictors specified in the creditscorecard
object's PredictorVars property are binned and the remaining
ones are unchanged.

bdata = bindata(sc,Name,Value)
binned predictor variables returned as a table using optional name-value pair
arguments. This is a table of the same size as the data input, but only the
predictors specified in the creditscorecard object's
PredictorVars property are binned and the remaining ones
are unchanged.

Examples

Bin creditscorecard Data as Bin Numbers, Categories, or WOE Values

This example shows how to use the bindata function to simply bin or discretize data.

Suppose bin ranges of

'0 to 30'

'31 to 50'

'51 and up'

are determined for the age variable (via manual or automatic binning). If a data point with age 41 is given, binning this data point means placing it in the bin for 41 years old, which is the second bin, or the '31 to 50' bin. Binning is then the mapping from the original data, into discrete groups or bins. In this example, you can say that a 41-year old is mapped into bin number 2, or that it is binned into the '31 to 50' category. If you know the Weight of Evidence (WOE) value for each of the three bins, you could also replace the data point 41 with the WOE value corresponding to the second bin. bindata supports the three binning formats just mentioned:

Bin number (where the 'OutputType' name-value pair argument is set to 'BinNumber'); this is the default option, and in this case, 41 is mapped to bin 2.

Categorical (where the 'OutputType' name-value pair argument is set to 'Categorical'); in this case, 41 is mapped to the '31 to 50' bin.

WOE value (where the 'OutputType' name-value pair argument is set to 'WOE'); in this case, 41 is mapped to the WOE value of bin number 2.

Create a creditscorecard object using the CreditCardData.mat file to load the data (using a dataset from Refaat 2011). Use the 'IDVar' argument to indicate that 'CustID' contains ID information and should not be included as a predictor variable.

Bin Additional "Test" Data

This example shows how to use the bindata function's optional input for the data to bin. If not provided, bindata bins the creditscorecard training data. However, if a different dataset needs to be binned, for example, some "test" data, this can be passed into bindata as an optional input.

Create a creditscorecard object using the CreditCardData.mat file to load the data (using a dataset from Refaat 2011). Use the 'IDVar' argument to indicate that 'CustID' contains ID information and should not be included as a predictor variable.

For the 'CustAge' and 'ResStatus' predictors, there is missing data (NaNs and <undefined>) in the training data, and the binning process estimates a WOE value of -0.15787 and 0.026469 respectively for missing data in these predictors, as shown above.

For the purpose of illustration, take a few rows from the original data as test data and introduce some missing data.

For the 'CustAge' and 'ResStatus' predictors, because there is missing data in the training data, the missing values in the test data get mapped to the WOE value estimated for the <missing> bin. Therefore, a missing value for 'CustAge' is replaced with -0.15787, and a missing value for 'ResStatus' is replaced with 0.026469.

For 'TmAtAddress' and 'EmpStatus', the training data has no missing values, therefore there is no bin for missing data, and there is no way to estimate a WOE value for missing data. Therefore, for these predictors, the WOE transformation leaves missing values as missing (that is, sets a WOE value of NaN).

These rules apply when 'OutputType' is set to 'WOE' or 'WOEModelInput'. The rationale is that if a data-based WOE value exists for missing data, it should be used for the WOE transformation and for subsequent steps (for example, fitting a logistic model or scoring).

On the other hand, when 'OutputType' is set to 'BinNumber' or 'Categorical', bindata leaves missing values as missing, since this allows you to subsequently treat the missing data as you see fit.

For example, when 'OutputType' is set to 'BinNumber', missing values are set to NaN:

Apply a Weight of Evidence (WOE) Transformation to Data

When the 'OutputType' name-value argument is set to 'WOE', bindata simply applies the WOE transformation to all predictors and keeps the rest of the variables in the original data in place and unchanged.

When the 'OutputType' name-value pair argument is set to 'WOEModelInput', bindata returns a table that can be used directly as an input for fitting a logistic regression model for the scorecard. In this case, bindata:

Applies WOE transformation to all predictors.

Returns predictor variables, but no IDVar or unused variables are included in the output.

Includes the mapped response variable as the last column.

The fitmodel function calls bindata internally using the 'WOEModelInput' option to fit the logistic regression model for the creditscorecard model.

Create a creditscorecard object using the CreditCardData.mat file to load the data (using a dataset from Refaat 2011). Use the 'IDVar' argument to indicate that 'CustID' contains ID information and should not be included as a predictor variable.

Input Arguments

sc — Credit scorecard modelcreditscorecard object

Credit scorecard model, specified as a
creditscorecard object. Use creditscorecard to create
a creditscorecard object.

data — Data to bin given the rules set in creditscorecard objecttable

Data to bin given the rules set in the
creditscorecard object, specified using a table.
By default, data is set to the
creditscorecard object's raw data.

Before creating a creditscorecard object, perform a
data preparation task to have an appropriately structured
data as input to a
creditscorecard object.

Data Types: table

Name-Value Pair Arguments

Specify optional
comma-separated pairs of Name,Value arguments. Name is
the argument name and Value is the corresponding value.
Name must appear inside quotes. You can specify several name and value
pair arguments in any order as
Name1,Value1,...,NameN,ValueN.

Output format, specified as the comma-separated pair consisting of
'OutputType' and a character vector with the
following values:

BinNumber — Returns the bin
numbers corresponding to each observation.

Categorical — Returns the bin
label corresponding to each observation.

WOE — Returns the Weight of
Evidence (WOE) corresponding to each observation.

WOEModelInput — Use this option
when fitting a model. This option:

Returns the Weight of Evidence (WOE)
corresponding to each observation.

Returns predictor variables, but no
IDVar or unused variables are
included in the output.

Discards any predictors whose bins have
Inf or NaN
WOE values.

Includes the mapped response variable as the
last column.

Note

When the bindata name-value
pair argument 'OutputType' is set
to 'WOEModelInput', the
bdata output only contains the
columns corresponding to predictors whose bins do
not have Inf or
NaN Weight of Evidence (WOE)
values, and bdata includes the
mapped response as the last column.

Missing data (if any) are included in the
bdata output as missing data as
well, and do not influence the rules to discard
predictors when 'OutputType' is
set to 'WOEModelInput'.

Output Arguments

bdata — Binned predictor variablestable

Binned predictor variables, returned as a table. This is a table of
the same size (see exception in the following Note) as the data input,
but only the predictors specified in the
creditscorecard object's
PredictorVars property are binned and the
remaining ones are unchanged.

Note

When the bindata name-value pair argument
'OutputType' is set to
'WOEModelInput', the
bdata output only contains the columns
corresponding to predictors whose bins do not have
Inf or NaN Weight of
Evidence (WOE) values, and bdata includes the
mapped response as the last column.

Missing data (if any) are included in the
bdata output as missing data as well, and
do not influence the rules to discard predictors when
'OutputType' is set to
'WOEModelInput'.

This website uses cookies to improve your user experience, personalize content and ads, and analyze website traffic. By continuing to use this website, you consent to our use of cookies. Please see our Privacy Policy to learn more about cookies and how to change your settings.