Unconditional vs. Conditional Mean

For a random variable yt,the unconditional mean is simply the expected

value,Eyt. In contrast, the conditional mean of yt is the expected value of

yt given a conditioning set of variables, t.A conditional mean model specifies

a functional form forEytt: .

Static vs. Dynamic Conditional Mean ModelsFor a static conditional mean model, the conditioning set of variables is measured contemporaneouslywith the dependent variable yt.Anexample of a static conditional mean model is the ordinary linearregression model.Given xt, a row vector of exogenous covariates measured at time t, and , a column vector of coefficients,the conditional mean of yt is expressed as the linear combinationEy x(| )tt txcE(that is, the conditioning set is :ttx ).In time series econometrics, there is often interest in the dynamic behavior of a variable over time. Adynamic conditional mean model specifies the expected value of yt as a function of historicalinformation. Let Ht1 denote the history of the process available at time t. A dynamic conditional mean

model specifies the evolution of the conditional mean,Ey Htt|. Examples

of historical information are: Past observations, y1, y2,...,yt1

Vectors of past exogenous variables,

xx x12

,, ,

t1

Past innovations,,, ,HH H 12 t1

Conditional Mean Models for Stationary Processes

By definition, a covariance stationary stochastic process has an unconditional mean that is constant withrespect to time. That is, if yt is a stationary stochastic process, then Eyt() P for all times t.The constant mean assumption of stationarity does not preclude the possibility of a dynamic conditionalexpectation process. The serial autocorrelation between lagged observations exhibited by many timeseries suggests the expected value of yt depends on historical information. By Wolds decomposition [1],you can write the conditional mean of any stationary process yt asf

Ey H(| )1 P\Hiti,tt

(4-1)i 1

where H^` are past observations of an uncorrelated innovation process with

mean zero, and the coefficients \i are absolutely summable. Eyt() P is the constant unconditional mean ofthe stationary process.Any model of the general linear form given by Equation 4-1 is a valid specification for the dynamicbehavior of a stationary stochastic process. Special cases of stationary stochastic processes are theautoregressive (AR) model, moving average (MA) model, and the autoregressive moving average(ARMA) model.

In either equation, the default innovation distribution is Gaussian with mean zero and constant variance.You can specify a model of this form using the shorthand syntax arima(p,D,q). For the input argumentsp, D, and q, enter the number of nonseasonal AR terms (p), the order of nonseasonal integration (D), andthe number of nonseasonal MA terms (q), respectively.When you use this shorthand syntax, arima creates an arima model with these default property values.Property Name ARBetaConstantDDistributionProperty Data TypeCell vector of NaNsEmpty vector [] of regression coefficients corresponding to exogenous covariatesNaNDegree of nonseasonal integration, D 'Gaussian'

Property Name MAPQ SARSMAVarianceProperty Data TypeCell vector of NaNsNumber of AR terms plus degree of integration, p + DNumber of MA terms, qCell vector of NaNsCell vector of NaNsNaNTo assign nondefault values to any properties, you can modify the created model object using dotnotation.Notice that the inputs D and q are the values arima assigns to properties D and Q. However, the inputargument p is not necessarily the value arima assigns to the model property P. P stores the number ofpresample observations needed to initialize the AR component of the model. For nonseasonal models,the required number of presample observations is p + D.To illustrate, consider specifying the ARIMA(2,1,1) modelLL Ly c LTH11221 tt

where the innovation process is Gaussian with (unknown) constant variance.

The created model object, model,has NaNs for all parameters. A NaN value signals that a parameterneeds to be estimated or otherwise specified by the user. All parameters must be specified to forecast orsimulate the model.To estimate parameters, input the model object (along with data) to estimate. This returns a new fittedarima model object. The fitted model object has parameter estimates for each input NaN value.Calling arima without any input arguments returns an ARIMA(0,0,0) model specification with defaultproperty values:model = arimamodel =ARIMA(0,0,0) Model:------------------Distribution: Name = 'Gaussian'P: 0D: 0Q: 0Constant: NaNAR: {}SAR: {}MA: {}SMA: {}Variance: NaN

Specify Nonseasonal Models Using Name-Value Pairs

The best way to specify models to arima is using name-value pair arguments. You do not need, nor areyou able, to specify a value for every model object property. arima assigns default values to anyproperties you do not (or cannot) specify.In condensed, lag operator notation, nonseasonal ARIMA(p,D,q) models are of the form()( )LL LDtt() .(4-2)ITHYou can extend this model to an ARIMAX(p,D,q) model with the linear inclusion of exogenousvariables. This model has the formc

()Lx L() , (4-3)IETH

where c* = c/(1L)D and *(L) = (L)/(1L)D.Tip If you specify a nonzero D, then Econometrics Toolbox differences the response series ytbefore thepredictors enter the model. You should preprocess the exogenous covariates xt by testing for stationarityand differencing if any are unit root nonstationary. If any nonstationary exogenous covariate entersthemodel,thenthefalsenegativerateforsignificancetestsof can increase.For the distribution of the innovations, t, there are two choices: Independent and identically distributed (iid) Gaussian or Students t with a constant variance,2. VH

Dependent Gaussian or Students t with a conditional variance process, Vt2. Specify the conditionalvariance model using a garch, egarch,or gjr model.The arima default for the innovations is an iid Gaussian process with constant (scalar) variance.In order to estimate, forecast, or simulate a model, you must specify the parametric form of the model(e.g., which lags correspond to nonzero coefficients, the innovation distribution) and any knownparameter values. You can set any unknown parameters equal to NaN, and then input the model toestimate (along with data) to get estimated parameter values.arima (and estimate) returns a model corresponding to the model specification. You can modify modelsto change or update the specification. Input models (with no NaN values) to forecast or simulate forforecasting and simulation, respectively. Here are some example specifications using name-valuearguments.Model Specificationarima('AR',NaN) or arima(1,0,0)yc y IHttt11HVHttzzt Gaussian

Name Corresponding Model Term(s) in Equation 4-2

AR Nonseasonal AR coefficients,,,1IIpWhen to SpecifyTo set equality constraints for the AR coefficients. For example, to specify the AR coefficients in themodelARLags Lagscorresponding to nonzero, nonseasonal AR coefficientsBeta Values of the coefficients of the exogenous covariatesyy y

H.. ,tt t t

specify 'AR',{0.8,-0.2}You only need to specify the nonzero elements of AR. If the nonzero coefficients are at nonconsecutivelags, specify the corresponding lags using ARLags.Any coefficients you specify must correspond to a stable AR operator polynomial.ARLags is not a model property.Use this argument as a shortcut for specifying AR when the nonzero AR coefficients correspond tononconsecutive lags. For example, to specify nonzero AR coefficients at lags 1 and12, e.g.,yy y , specifytt t tII H'ARLags',[1,12].Use AR and ARLags together to specify known nonzero AR coefficients at nonconsecutive lags. Forexample, if in the given AR(12)model I1 06 and I12 ., specify 'AR',{0.6,-0.3},'ARLags',[1,12].Usethisargumenttospecifythevaluesofthe coefficients of the exogenous variables. For example, use'Beta',[0.5 7 -2] to specify05 7

>@c By default, Beta is an emptyE 2..

MA Nonseasonal MA coefficients,,,1TTqTo set equality constraints for c. For example, for a model with no constant term, specify 'Constant',0.By default, Constant has value NaN.To specify a degree of nonseasonal differencing greater than zero. For example, to specify one degree ofdifferencing, specify 'D',1.By default, D has value 0 (meaning no nonseasonal integration).Use this argument to specify a Students t innovation distribution. By default, the innovation distributionis Gaussian.For example, to specify a t distributionwith unknown degrees of freedom, specify 'Distribution','t'.To specify a t innovation distribution with known degrees of freedom, assign Distribution a datastructure with fields Name and DoF. For example, for a t distribution with nine degrees of freedom,specify 'Distribution',struct('Name','t','DoF',9).To set equality constraints for the MA coefficients. For example, to specify the MA coefficients in themodely05tt t t

HH H..,12 specify 'MA',{0.5,0.2}.You only need to specify the nonzero elements of MA. If the nonzero coefficients are at nonconsecutivelags, specify the corresponding lags using MALags.Model Term(s) in Equation 4-2MALags Lagscorresponding to nonzero, nonseasonal MA coefficients Any coefficients you specify must correspondto an invertible MA polynomial.MALags is not a model property.Use this argument as a shortcut for specifying MA when the nonzero MA coefficients correspond tononconsecutive lags. For example, to specify nonzero MA coefficients at lags 1 and 4, e.g.,ytt t tHTH TH ,11 4 4

specify 'MALags',[1,4].Use MA and MALags together to specify known nonzero MA coefficients at nonconsecutive lags. Forexample, if in the given MA(4)

The innovation series can be an independent or dependent Gaussian or Students t process. The arimadefault for the innovation distribution is an iid Gaussian process with constant (scalar) variance.

In addition to the arguments for specifying nonseasonal models (described in Name-Value Argumentsfor Nonseasonal ARIMA Models on page 4-11), you can specify these name-value arguments to create amultiplicative arima model. You can extend an ARIMAX model similarly to include seasonal effects.Name-Value Arguments for Seasonal ARIMA ModelsArgumentSARSARLagsCorresponding Model Term(s) in Equation 4-5Seasonal ARcoefficients,,,))psLags corresponding to nonzero seasonal AR coefficients, in the periodicity of the observed seriesWhen to SpecifyTo set equality constraints for the seasonal AR coefficients. When specifying AR coefficients, use thesign opposite to what appears in Equation 4-5 (that is, use the sign of the coefficient as it would appearon the right side of the equation).Use SARLags to specify the lags of the nonzero seasonal AR coefficients. Specify the lags associatedwith the seasonal polynomials in the periodicity of the observed data (e.g., 4, 8,... for quarterly data, or12, 24,... for monthly data), and not as multiples of the seasonality (e.g., 1, 2,...).For example, to specify the model(.)( . ) ,LLy12tt

specify 'AR',0.8,'SAR',0.2,'SARLags',12 .Any coefficient values you enter must correspond to a stable seasonal AR polynomial.SARLags is not a model property.Use this argument when specifying SAR to indicate the lags of the nonzero seasonal AR coefficients.For example, to specify the model12

SMALags Lags corresponding to the nonzero seasonal MA coefficients, in the periodicity of theobserved series To set equality constraints for the seasonal MA coefficients.Use SMALags to specify the lags of the nonzero seasonal MA coefficients. Specify the lags associatedwith the seasonal polynomials in the periodicity of the observed data (e.g., 4, 8,... for quarterly data, or12, 24,... for monthly data), and not as multiples of the seasonality (e.g., 1, 2,...).For example, to specify the modelyL L(.)( . ),12 Htt

specify 'MA',0.6,'SMA',0.2,'SMALags',12.Any coefficient values you enter must correspond to an invertible seasonal MA polynomial.SMALags is not a model property.Use this argument when specifying SMA to indicate the lags of the nonzero seasonal MA coefficients.For example, to specify the modelSeasonality Seasonal periodicity, s yL L()( ),4 414 ttTH

Autoregressive ModelIn this section...AR(p) Model on page 4-18Stationarity of the AR Model on page 4-18

AR(p)ModelMany observed time series exhibit serial autocorrelation; that is, linear association between laggedobservations. This suggests past observations might predict current observations. The autoregressive(AR) process modelsthe conditional mean of yt as a function of past observations,yy y .tt tp An AR process that depends on ppast observations is called an AR model of degree p, denoted by AR(p).The form of the AR(p) model in Econometrics Toolbox isyc y11

The signs of the coefficients in the AR lag operator polynomial, I() ,are opposite to the right side ofEquation 4-6. When specifying and interpreting AR coefficients in Econometrics Toolbox, use the formin Equation 4-6.

Stationarity of the AR Model

Consider the AR(p) model in lag operator notation,()Ly ctt.IHFrom this expression, you can see that1yL L() , (4-8)

tt t

wherecP II

1is the unconditional mean of the process, and \() is an infinite-degree lag operator polynomial,()LL2 .\\12

Note The Constant property of an arima model object corresponds to c, and not the unconditional mean .By Wolds decomposition [1], Equation 4-8 corresponds to a stationary stochastic process provided thecoefficients \i are absolutely summable. Thisis the case when the AR polynomial, I() ,is stable, meaning all its roots lie outside the unit circle.Econometrics Toolbox enforces stability of the AR polynomial. When you specify an AR model usingarima, you get an error if you enter coefficients that do not correspond to a stable polynomial. Similarly,estimate imposes stationarity constraints during estimation.

AR Model SpecificationsIn this section...Default AR Model on page 4-21AR Model with No Constant Term on page 4-22 AR Model with Nonconsecutive Lags on page4-23 AR Model with Known Parameter Values on page 4-24 AR Model with a t InnovationDistribution on page 4-25

Default AR ModelThis example shows how to use the shorthand arima(p,D,q) syntax to specify the default AR(p) model,yc y y .By default, all parameters in the created model object have unknown values, and the innovationdistribution is Gaussian with constant variance.Specify the default AR(2) model:model = arima(2,0,0)model =ARIMA(2,0,0) Model:------------------Distribution: Name = 'Gaussian'P: 2D: 0Q: 0Constant: NaNAR: {NaN NaN} at Lags [1 2] SAR: {}MA: {}SMA: {}Variance: NaNThe output shows that the created model object, model, has NaNs for all model parameters: the constantterm, the AR coefficients, and the variance. You can modify the created model object using dot notation,or input it (along with data) to estimate.

AR Model with No Constant Term

This example shows how to specify an AR( p) model with constant term equal to zero. Use name-valuesyntax to specify a model that differs from the default model.Specify an AR(2) model with no constant term,yy y ,

tt t t

where the innovation distribution is Gaussian with constant variance.

model = arima('ARLags',1:2,'Constant',0)model =

ARIMA(2,0,0) Model:------------------Distribution: Name = 'Gaussian'P: 2D: 0Q: 0Constant: 0AR: {NaN NaN} at Lags [1 2] SAR: {}MA: {}SMA: {}Variance: NaNThe ARLags name-value argument specifies the lags corresponding to nonzero AR coefficients. Theproperty Constant in the created model object is equal to 0, as specified. The model object has defaultvalues for all other properties, including NaNs as placeholders for the unknown parameters: the ARcoefficients and scalar variance.You can modify the created model object using dot notation, or input it (along with data) to estimate.

AR Model with Nonconsecutive Lags

This example shows how to specify an AR(p) model with nonzero coefficients at nonconsecutive lags.Specify an AR(4) model with nonzero AR coefficients at lags 1 and 4 (and no constant term),yy y ,

AR Model with Known Parameter Values

This example shows how to specify an AR( p) model with known parameter values. You can use such afully specified model as an input to simulate or forecast.Specify the AR(4) modelyyyHtttt.. . ,where the innovation distribution is Gaussian with constant variance 0.15.model = arima('Constant',0.2,'AR',{0.8,-0.1},... 'ARLags',[1,4],'Variance',0.15)model =ARIMA(4,0,0) Model:------------------Distribution: Name = 'Gaussian'P: 4D: 0Q: 0Constant: 0.2AR: {0.8 -0.1} at Lags [1 4] SAR: {}MA: {}SMA: {}Variance: 0.15Because all parameter values are specified, the created model object has no NaNs. The functionssimulate and forecast dont accept input models with NaN values.

AR Model with a t Innovation Distribution

This example shows how to specify an AR(p) model with a Students t innovation distribution.Specify an AR(2) model with no constant term,yy y ,

tt t t

where the innovations follow a Students t distribution with unknown degrees of freedom.model = arima('Constant',0,'ARLags',1:2,'Distribution','t')model =

ARIMA(2,0,0) Model:------------------Distribution: Name = 't', DoF = NaNP: 2D: 0Q: 0Constant: 0AR: {NaN NaN} at Lags [1 2] SAR: {}MA: {}SMA: {}Variance: NaNThe value of Distribution is a struct array with field Name equal to 't' and field DoF equal to NaN.TheNaN value indicates the degrees of freedom are unknown, and need to be estimated using estimate orotherwise specified by the user.

Moving Average Model

In this section...MA(q) Model on page 4-27Invertibility of the MA Model on page 4-27

MA(q)ModelThe moving average (MA) model captures serial autocorrelation in a time series yt by expressing theconditional mean of yt as a function of past innovations,HH H,,,tt tq . An MA model that depends on qpast12

innovations is called an MA model of degree q, denoted by MA(q).

The form of the MA(q) model in Econometrics Toolbox isyc HTH TH ,tt t qtq (4-9)where Ht is an uncorrelated innovation process with mean zero. For an MA process, the unconditionalmean of yt is = c.

Invertibility of the MA Model

By Wolds decomposition [1], an MA(q) process is always stationary because T() is a finite-degreepolynomial.For a given process, however, there is no unique MA polynomialthere is always a noninvertible andinvertible solution [2]. For uniqueness, it is conventional to impose invertibility constraints on the MApolynomial. Practically speaking, choosing the invertible solution implies the process is causal. Aninvertible MA process can be expressed as an infinite-degree AR process, meaning only past events (notfuture events) predict current events.The MA operator polynomial T() is invertible if all its roots lie outside the unit circle.Econometrics Toolbox enforces invertibility of the MA polynomial. When you specify an MA modelusing arima, you get an error if you enter coefficients that do not correspond to an invertible polynomial.Similarly, estimate imposes invertibility constraints during estimation.

MA Model with No Constant Term on page 4-30 MA Model with Nonconsecutive Lags on page4-31 MA Model with Known Parameter Values on page 4-32 MA Model with a t InnovationDistribution on page 4-33

Default MA ModelThis example shows how to use the shorthand arima(p,D,q) syntax to specify the default MA(q) model,yctt t qtq HTH TH .By default, all parameters in the created model object have unknown values, and the innovationdistribution is Gaussian with constant variance.Specify the default MA(3) model:model = arima(0,0,3)model =ARIMA(0,0,3) Model:------------------Distribution: Name = 'Gaussian'P: 0D: 0Q: 3Constant: NaNAR: {}SAR: {}MA: {NaN NaN NaN} at Lags [1 2 3] SMA: {}Variance: NaNThe output shows that the created model object, model, has NaNs for all model parameters: the constantterm, the MA coefficients, and the variance. You can modify the created model object using dotnotation, or input it (along with data) to estimate.

MA Model with No Constant Term

This example shows how to specify an MA( q) model with constant term equal to zero. Use name-valuesyntax to specify a model that differs from the default model.Specify an MA(2) model with no constant term,ytt t tHTH TH ,11 2 2

where the innovation distribution is Gaussian with constant variance.

P: 0D: 0Q: 2Constant: 0AR: {}SAR: {}MA: {NaN NaN} at Lags [1 2] SMA: {}Variance: NaNThe MALags name-value argument specifies the lags corresponding to nonzero MA coefficients. Theproperty Constant in the created model object is equal to 0, as specified. The model object has defaultvalues for all other properties, including NaNs as placeholders for the unknown parameters: the MAcoefficients and scalar variance.You can modify the created model variable, or input it (along with data) to estimate.

MA Model with Nonconsecutive Lags

This example shows how to specify an MA(q) model with nonzero coefficients at nonconsecutive lags.Specify an MA(4) model with nonzero MA coefficients at lags 1 and 4 (an no constant term),ytt t tHTH TH1 12 12,1

The MA cell array returns four elements. The first and last elements (corresponding to lags 1 and 4)have value NaN, indicating these coefficientsarenonzeroandneedtobeestimatedorotherwisespecifiedbytheuser. arimasetsthecoefficientsatinterimlagsequaltozerotomaintainconsistency with MATLAB cell array indexing.

MA Model with Known Parameter Values

This example shows how to specify an MA( q) model with known parameter values. You can use such afully specified model as an input to simulate or forecast.Specify the MA(4) modely01.. .,tt t tHH H0214where the innovation distribution is Gaussian with constant variance 0.15.model = arima('Constant',0.1,'MA',{0.7,0.2},... 'MALags',[1,4],'Variance',0.15)model =ARIMA(0,0,4) Model:------------------Distribution: Name = 'Gaussian'P: 0D: 0Q: 4Constant: 0.1AR: {}SAR: {}MA: {0.7 0.2} at Lags [1 4] SMA: {}Variance: 0.15Because all parameter values are specified, the created model object has no NaNs. The functionssimulate and forecast dont accept input models with NaN values.

MA Model with a t Innovation Distribution

This example shows how to specify an MA(q) model with a Students t innovation distribution.Specify an MA(2) model with no constant term,ytt t tHTH TH ,11 2 2

where the innovation process follows a Students t distribution with eight degrees of freedom.tdist = struct('Name','t','DoF',8);model = arima('Constant',0,'MALags',1:2,'Distribution',tdist)model =

ARIMA(0,0,2) Model:------------------Distribution: Name = 't', DoF = 8P: 0D: 0Q: 2Constant: 0AR: {}SAR: {}MA: {NaN NaN} at Lags [1 2] SMA: {}Variance: NaNThe value of Distribution is a struct array with field Name equal to 't' and field DoF equal to 8. Whenyou specify the degrees of freedom, they arent estimated if you input the model to estimate.

Autoregressive Moving Average Model

In this section...ARMA(p,q) Model on page 4-35Stationarity and Invertibility of the ARMA Model on page 4-36

ARMA(p,q)ModelFor some observed time series, a very high-order AR or MA model is needed to model the underlyingprocess well. In this case, a combined autoregressive moving average (ARMA) model can sometimes bea more parsimonious choice.An ARMA model expresses the conditional mean of yt as a function of both past observations,yy , andpast innovations,1,, .ttq Thettp HHnumber of past observations that yt depends on, p, is the AR degree. The number of past innovations thatyt depends on, q,istheMAdegree. In general, these models are denoted by ARMA(p,q).

). You can write the ARMA(p,q)

TT Tmodel as()Ly cITH() . (4-11)ttThe signs of the coefficients in the AR lag operator polynomial, I() ,are opposite to the right side ofEquation 4-10. When specifying and interpreting AR coefficients in Econometrics Toolbox, use the formin Equation 4-10.

Stationarity and Invertibility of the ARMA Model

Consider the ARMA(p,q) model in lag operator notation,()Ly cITH() .tt

From this expression, you can see that

ytt tP T () () ,(4-12)IHP\ HwherecP II

is the unconditional mean of the process, and \() is a rational, infinite-degree lag operatorpolynomial,()LL2 .\\12

Note The Constant property of an arima model object corresponds to c, and not the unconditional mean .By Wolds decomposition [1], Equation 4-12 corresponds to a stationary stochastic process provided thecoefficients \i are absolutely summable. This

is the case when the AR polynomial, I() ,is stable, meaning all its roots lie outside the unit circle.Additionally, the process is causal provided the MA polynomial is invertible, meaning all its roots lieoutside the unit circle.Econometrics Toolbox enforces stability and invertibility of ARMA processes. When you specify anARMA model using arima, you get an error if you enter coefficients that do not correspond to a stableAR polynomial or invertible MA polynomial. Similarly, estimate imposes stationarity and invertibilityconstraints during estimation.

ARMA Model Specifications

In this section...Default ARMA Model on page 4-38ARMA Model with No Constant Term on page 4-39 ARMA Model with Known Parameter Valueson page 4-40

Default ARMA Model

This example shows how to use the shorthand arima(p,D,q) syntax to specify the default ARMA(p,q)model,yy yx602..01 tt t HH12 1

By default, all parameters in the created model object have unknown values, and the innovationdistribution is Gaussian with constant variance.Specify the default ARMA(1,1) model:model = arima(1,0,1)model =ARIMA(1,0,1) Model:------------------Distribution: Name = 'Gaussian'P: 1D: 0Q: 1Constant: NaNAR: {NaN} at Lags [1] SAR: {}MA: {NaN} at Lags [1] SMA: {}Variance: NaNThe output shows that the created model object, model, has NaNs for all model parameters: the constantterm, the AR and MA coefficients, and the variance. You can modify the created model object using dotnotation, or input it (along with data) to estimate.

ARMA Model with No Constant Term

This example shows how to specify an ARMA( p,q) model with constant term equal to zero. Use namevalue syntax to specify a model that differs from the default model.Specify an ARMA(2,1) model with no constant term,yy yII HTH ,

The ArLags and MaLags name-value arguments specify the lags corresponding to nonzero AR and MAcoefficients, respectively. The property Constant in the created model object is equal to 0, as specified.The model object has default values for all other properties, including NaNsasplaceholdersfortheunknown parameters: the AR and MA coefficients, and scalar variance.You can modify the created model object using dot notation, or input it (along with data) to estimate.

ARMA Model with Known Parameter Values

This example shows how to specify an ARMA( p,q) model with known parameter values. You can usesuch a fully specified model as an input to simulate or forecast.Specify the ARMA(1,1) modelyy03 07IH H04. ,tttt11

Concepts Autoregressive Moving Average Model on page 4-35

ARIMA ModelThe autoregressive integrated moving average (ARIMA) process generates nonstationary series that areintegrated of order D, denoted I(D). A nonstationary I(D) process is one that can be made stationary bytaking D differences. Such processes are often called difference-stationary or unit root processes.A series that you can model as a stationary ARMA( p,q) process after being differenced D times isdenoted by ARIMA(p,D,q). The form of the ARIMA(p,D,q) model in Econometrics Toolbox isD

yc yDtpDtp t'' 't

II HTHTH11 qt q,

(4-13)where 'Dyt denotes a Dth differenced time series, and Ht is an uncorrelated innovation process with meanzero.In lag operator notation, Lyitti . You can write the ARIMA(p,D,q) model as*

) is a stable degree p AR lag operator

polynomial (with all roots lying outside the unit circle).

The signs of the coefficients in the AR lag operator polynomial, I() ,are opposite to the right side ofEquation 4-13. When specifying and interpreting AR coefficients in Econometrics Toolbox, use the formin Equation 4-13. Note In the original Box-Jenkins methodology, you difference an integrated seriesuntil it is stationary before modeling. Then, you model the differenced series as a stationary ARMA(p,q)process [1]. Econometrics Toolbox fits and forecasts ARIMA(p,D,q) processes directly, so you do notneed to difference data before modeling (or backtransform forecasts).

ARIMA Model Specifications

In this section...Default ARIMA Model on page 4-44ARIMA Model with Known Parameter Values on page 4-45

Default ARIMA Model

This example shows how to use the shorthand arima(p,D,q) syntax to specify the default ARIMA(p,D,q)model,Dyc yDtpDtp t ,'' 't II HTHTH11 qt qwhere 'Dyt is a Dth differenced time series. You can write this model in condensed form using LagOperator Notation on page 1-20:ITH()( ) DttLL L() .By default, all parameters in the created model object have unknown values, and the innovationdistribution is Gaussian with constant variance.Specify the default ARIMA(1,1,1) model:model = arima(1,1,1)model =ARIMA(1,1,1) Model:------------------Distribution: Name = 'Gaussian'

P: 2D: 1Q: 1Constant: NaNAR: {NaN} at Lags [1] SAR: {}MA: {NaN} at Lags [1] SMA: {}Variance: NaNThe output shows that the created model object, model, has NaNs for all model parameters: the constantterm, the AR and MA coefficients, and the variance. You can modify the created model object using dotnotation, or input it (along with data) to estimate.The property P has value 2 (p+D). This is the number of presample observations needed to initialize theAR model.

ARIMA Model with Known Parameter Values

This example shows how to specify an ARIMA( p,D,q) model with known parameter values. You canuse such a fully specified model as an input to simulate or forecast.Specify the ARIMA(2,1,1) modelyyy

Multiplicative ARIMA Model

Many time series collected periodically (e.g., quarterly or monthly) exhibit a seasonal trend, meaningthere is a relationship between observations made during the same period in successive years. Inaddition to this seasonal relationship, there can also be a relationship between observations made duringsuccessive periods. The multiplicative ARIMA model is an extension of the ARIMA model thataddresses seasonality and potential seasonal unit roots [1].In lag operator polynomial notation, Lyitti . For a series with periodicity s, the multiplicativeARIMA(p,D,q)(ps,Ds,qs)s is given byITH() ()( ) (11s DDttLL L L y c L L( ) ( ) . (4-15)Here,thestable,degree p AR operator polynomial() (1LL Lp) ,II I1 p and )() is a stable, degree ps ARoperator of the same form. Similarly, theinvertible, degree q MA operator polynomial() (LL Lq),andTT T1qq 4() is an invertible, degree qs MAoperator of the same form.When you specify a multiplicative ARIMA model using arima, Set the nonseasonal and seasonal AR coefficients with the opposite signs from their respective ARoperator polynomials. That is, specify the coefficients as they would appear on the right side of Equation4-15. Set the lags associated with the seasonal polynomials in the periodicity of the observed data (e.g., 4,8,... for quarterly data, or 12, 24,... for monthly data), and not as multiples of the seasonality (e.g., 1,2,...). This convention does not conform to standard Box and Jenkins notation, but is a more flexibleapproach for incorporating multiplicative seasonality.The nonseasonal differencing operator,() D accounts for nonstationarityin observations made in successive periods. The seasonal differencing

Multiplicative ARIMA Model Specifications

In this section...Seasonal ARIMA Model with No Constant Term on page 4-49 Seasonal ARIMA Model with KnownParameter Values on page 4-50

Seasonal ARIMA Model with No Constant Term

This example shows how to use arima to specify a multiplicative seasonal ARIMA model (for monthlydata) with no constant term.Specify a multiplicative seasonal ARIMA model with no constant term,12 112LL L y L L12

()( )() )( )( ),11 1

112

)4ITHtt

where the innovation distribution is Gaussian with constant variance. Here, ()1 is the first degreenonseasonal differencing operator and() L12 is the first degree seasonal differencing operator withperiodicity 12.model = arima('Constant',0,'ARLags',1,'SARLags',12,'D',1,...

Seasonality: 4Variance: 0.15The output specifies the nonseasonal and seasonal AR coefficients with opposite signs compared to thelag polynomials. This is consistent with the difference equation form of the model. The output specifiesthe lags of the seasonal AR and MA coefficients using SARLags and SMALags, respectively. Dspecifies the degree of nonseasonal integration. Seasonality = 4 specifies quarterly data with one degreeof seasonal integration.All of the parameters in the model have a value. Therefore, the model does not contain any NaNs. Thefunctions simulate and forecastdo not accept input models with NaN values.

Specify Multiplicative ARIMA Model

This example shows how to specify a seasonal ARIMA model using arima. The time series is monthlyinternational airline passenger numbers from 1949 to 1960.Step 1. Load the airline passenger data.

Copy the data file Data_Airline.mat from the folder

\help\toolbox\econ\examples in your MATLAB root, and paste it into your current working folder (or set\help\toolbox\econ\examples as your current working folder).Load the data, and then plot the natural log of the monthly passenger totals.load Data_AirlineY = log(Dataset.PSSG);N = length(Y);mos = get(Dataset,'ObsNames');figure(1)plot(Y)xlim([1,N])set(gca,'XTick',[1:18:N])set(gca,'XTickLabel',mos([1:18:N]))title('Log Airline Passengers')ylabel('(thousands)')The data look nonstationary, with a linear trend and seasonal periodicity.Step 2. Plot the seasonally integrated series.

Calculate the differenced series,()( )LLy12 t,where yt is the original

figure(3)autocorr(dY,50)The sample ACF of the differenced series shows significant autocorrelation at lags that are multiples of12. There is also potentially significant autocorrelation at smaller lags.

ARIMA Model Including Exogenous Covariates

In this section...ARIMAX(p,D,q) Model on page 4-58Conventions and Extensions of the ARIMAX Model on page 4-58

ARIMAX(p,D,q)ModelThe autoregressive moving average model including exogenous covariates, ARMAX(p,q), extends theARMA(p,q) model by including the linear effect that one or more exogenous series has on the stationaryresponse series yt.The general form of the ARMAX(p,q) model isprq

yy x IEHTHj tj,(4-16)and it has the following condensed form in lag operator notation:()Ly c xc () .i

11 1

IETH

(4-17)

tt t

In Equation 4-17, the vector xt holds the values of the r exogenous, time-varying predictors at time t,with coefficients denoted .You can use this model to check if a set of exogenous variables has an effect on a linear time series. Forexample, suppose you want to measure how the previous weeks average price of oil, xt, affects thisweeks United States exchange rate yt. The exchange rate and the price of oil are time series, so anARMAX model can be appropriate to study their relationships.

Conventions and Extensions of the ARIMAX Model

ARMAX models have the same stationarity requirements as ARMA models. Specifically, the responseseries is stable if the roots of the homogeneous pp p 2 ...p lie inside ofIII I1 characteristic equation of (LLLL)

the unit circle according to Wolds Decomposition [1].

If the response series yt is not stable, then you can difference it to form a stationary ARIMA model. Dothis by specifying the degrees of integration D. Econometrics Toolbox enforces stability of the ARpolynomial. When you specify an AR model using arima, the software displays an error if you enter

coefficients that do not correspond to a stable polynomial. Similarly, estimate imposes stationarityconstraints during estimation. The software differences the response series ytbefore including the exogenous covariates if you specifythe degree of integration D.Inother words, the exogenous covariates enter a model with a stationaryresponse. Therefore, the ARIMAX(p,D,q) model is()Ly c x

IETH() , (4-18)

tt t

where c* = c/(1 L)D and *(L) = (L)/(1 L)D. Subsequently, the interpretation of has changed to theexpected effect a unit increase in the predictor has on the difference between current and lagged valuesof the response (conditional on those lagged values). You should assess whether the predictor series xt are stationary. Difference all predictor series that arenot stationary with diff during the data preprocessing stage. If xt is nonstationary, then a test for thesignificance of can produce a false negative. The practical interpretation of changes if you difference thepredictor series. The software uses maximum likelihood estimation for conditional mean models such as ARIMAXmodels. You can specify either a Gaussian or Students t for the distribution of the innovations. You can include seasonal components in an ARIMAX model (see Multiplicative ARIMA Model onpage 4-47) which creates a SARIMAX(p,D,q)(ps,Ds,qs)s model. Assuming that the response series yt isstationary, the model has the form() ()LLy cx L Lc () () , )4where (L) and (L) are the seasonal lag polynomials. If yt is not stationary, then you can specify degreesof nonseasonal or seasonal integration using arima.Ifyouspecify Seasonality 0, then the softwareapplies degree one seasonal differencing (Ds = 1) to the response. Otherwise, Ds = 0. The softwareincludes the exogenous covariates after it differences the response. The software treats the exogenous covariates as fixed during estimationand inference.

ARIMAX Model Specifications

In this section...Specify ARIMAX Model Using Name-Value Pairs on page 4-61 Specify ARMAX Model Using DotNotation on page 4-62

Specify ARIMAX Model Using Name-Value Pairs

This example shows how to specify an ARIMAX model using arima. Specify the ARIMAX(1,1,0)model that includes three exogenous covariates:1c

3 2 5>@c . t H(.)( )101 1LLyxtt

model = arima('AR',0.1,'D',1,'Beta',[3 -2 5])

model =ARIMAX(1,1,0) Model:-------------------Distribution: Name = 'Gaussian'P: 2D: 1Q: 0Constant: NaNAR: {0.1} at Lags [1] SAR: {}MA: {}SMA: {}Beta: [3 -2 5]Variance: NaNThe output shows that the ARIMAX model, model, has the following qualities: Property P in the output is the sum of the autoregressive lags and the degree of integration, i.e.,P= p+D=2. Beta contains three coefficients corresponding to the effect that the exogenous covariates have on theresponse.

The rest of the properties are 0, NaN,oremptycells.

Tip If you specify nonzero D or Seasonality, then Econometrics Toolbox differences the response seriesyt before the predictors enter the model. Therefore, the predictors enter a stationary model with respectto the response series yt. You should preprocess the exogenous covariates xt by testing for stationarityand differencing if any are unit root nonstationary. If any nonstationary exogenous covariate enters themodel, then the false negative rate for significance tests of can increase.

Specify ARMAX Model Using Dot Notation

This example shows how to specify a stationary ARMAX model using arima.Specify the ARMAX(2,1) modelyy yx602..01 tt t HH12 1by including one stationary exogenous covariate in arima.model = arima('AR',[0.2 -0.3],'MA',0.1,'Constant',6,'Beta',3)model =ARIMAX(2,0,1) Model:-------------------Distribution: Name = 'Gaussian'P: 2D: 0Q: 1Constant: 6AR: {0.2 -0.3} at Lags [1 2] SAR: {}MA: {0.1} at Lags [1] SMA: {}Beta: [3]Variance: NaNThe output shows the model that you created, model, has NaNsoranempty cell ({})forthe Variance,SAR,and SMA properties. You can modify it using dot notation. For example, you can introduce anotherexogenous, stationary covariate, and specify that the variance of the innovations as 0.1:yy xyN03c

Concepts Specify Conditional Mean Models Using arima on page 4-6

Modify Properties of Conditional Mean Model Objects

In this section...Dot Notation on page 4-65Nonmodifiable Properties on page 4-69

Dot NotationA model object created by arima has values assigned to all model object properties. To change any ofthese property values, you do not need to reconstruct the whole model. You can modify property valuesof an existing model object using dot notation. That is, type the object name, then the property name,separated by '.' (a period).

For example, start with this model specification:

model = arima(2,0,0)model =ARIMA(2,0,0) Model:------------------Distribution: Name = 'Gaussian'P: 2D: 0Q: 0Constant: NaNAR: {NaN NaN} at Lags [1 2] SAR: {}MA: {}SMA: {}Variance: NaNModify the model to remove the constant term:model.Constant = 0model = ARIMA(2,0,0) Model:------------------Distribution: Name = 'Gaussian'P: 2D: 0Q: 0Constant: 0AR: {NaN NaN} at Lags [1 2] SAR: {}MA: {}SMA: {}Variance: NaNThe updated constant term now appears in the model output.Be aware that every model property has a data type. Any modifications you make to a property valuemust be consistent with the data type of the property. For example, AR, MA, SAR, and SMA are all cellvectors. This mean you must index them using cell array syntax.For example, start with the following model:model = arima(2,0,0)model =ARIMA(2,0,0) Model:------------------Distribution: Name = 'Gaussian'

P: 2D: 0Q: 0Constant: NaNAR: {NaN NaN} at Lags [1 2] SAR: {}MA: {}SMA: {}Variance: NaNTo modify the property value of AR,assign AR a cell array. Here, assign known AR coefficient values:model.AR = {0.8,-0.4}model =ARIMA(2,0,0) Model:------------------Distribution: Name = 'Gaussian'P: 2D: 0Q: 0Constant: NaNAR: {0.8 -0.4} at Lags [1 2] SAR: {}MA: {}SMA: {}Variance: NaNThe updated model object now has AR coefficients with the specified equality constraints.Similarly, the data type of Distribution is a data structure. The default data structure has only one field,Name,withvalue 'Gaussian'.model.Distributionans =Name: 'GaussianTo modify the innovation distribution, assign Distributionanewname or data structure. The data structurecan have up to two fields, Name and DoF. The second field corresponds to the degrees of freedom for aStudents t distribution, and is only required if Name has the value 't'.To specify a Students t distribution with unknown degrees of freedom, enter:model.Distribution = 't'model =ARIMA(2,0,0) Model:------------------Distribution: Name = 't', DoF = NaNP: 2D: 0Q: 0

Constant: NaNAR: {0.8 -0.4} at Lags [1 2] SAR: {}MA: {}SMA: {}Variance: NaNThe updated model object has a Students t distribution with NaN degrees of freedom. To specify a tdistribution with eight degrees of freedom, say:model.Distribution = struct('Name','t','DoF',8)model =ARIMA(2,0,0) Model:------------------Distribution: Name = 't', DoF = 8P: 2D: 0Q: 0Constant: NaNAR: {0.8 -0.4} at Lags [1 2] SAR: {}MA: {}SMA: {}Variance: NaNThe degrees of freedom in the model object are updated. Note that the DoF field of Distribution is notdirectly assignable. For example, model.Distribution.DoF = 8 is not a valid assignment. However, youcan get the individual fields:model.Distribution.DoFans = 8You can modify model to include, for example, two coefficients 1=0.2and 2 = 4 corresponding to twoexogenous covariate time series. Since Beta has not been specified yet, you have not seen it in theoutput. To include it, enter:model.Beta=[0.2 4]model =ARIMAX(2,0,0) Model:------------------Distribution: Name = 't', DoF = 8P: 2D: 0Q: 0Constant: NaNAR: {0.8 -0.4} at Lags [1 2] SAR: {}MA: {}

SMA: {}Beta: [0.2 4]Variance: NaN

Nonmodifiable PropertiesNot all model properties are modifiable. You cannot change these properties in an existing model: P. This property updates automatically when any of p (degree of the nonseasonal AR operator), ps(degree of the seasonal AR operator), D (degree of nonseasonal differencing), or s (degree of seasonaldifferencing) changes. Q. This property updates automatically when either q (degree of the nonseasonal MA operator), or qs(degree of the seasonal MA operator) changes.Not all name-value arguments you can use for model creation are properties of the created model object.Specifically, you can specify the arguments ARLags, MALags, SARLags,and SMALags during modelcreation. These are not, however, properties of arima model objects. This means you cannot retrieve ormodify them in an existing model object.The nonseasonal and seasonal AR and MA lags update automatically if you add any elements to (orremove from) the coefficient cell arrays AR, MA, SAR,or SMA.For example, specify an AR(2) model:model = arima(2,0,0)model =ARIMA(2,0,0) Model:------------------Distribution: Name = 'Gaussian'P: 2D: 0Q: 0Constant: NaNAR: {NaN NaN} at Lags [1 2] SAR: {}MA: {}SMA: {}Variance: NaNThe model output shows nonzero AR coefficients at lags 1 and 2.Add a new AR term at lag 12:model.AR{12} = NaNmodel =

Specify Conditional Mean Model Innovation Distribution

In this section...About the Innovation Process on page 4-72Choices for the Variance Model on page 4-73Choices for the Innovation Distribution on page 4-73 Specify the Innovation Distribution on page4-74Modify the Innovation Distribution on page 4-76

About the Innovation Process

You can express all stationary stochastic processes in the general linear form [1]f

ytt itiPH \H .i1

The innovation process, H , is an uncorrelatedbut not necessarily independentmean zero process

with a known distribution.In Econometrics Toolbox, the general form for the innovation process is HVz . Here, zt is an independentand identically distributed (iid) seriestttwith mean 0 and variance 1, and Vt2 is the variance of the innovation processat time t. Thus,2. Ht is an uncorrelated series with mean 0 and variance Vtarima model objects have two properties for storing information about the innovation process: Variance stores the form of Vt2 Distribution stores the parametric form of the distribution of zt

Choices for the Variance Model

If22t

VVH for all times t,then Ht is an independent process with constant variance,2. VH

The default value for Variance is NaN, meaning constant variance with unknown value. You canalternatively assign Variance any positive scalar value, or estimate it using estimate. A time series can exhibit volatility clustering, meaning a tendency for large changes to follow largechanges, and small changes to follow small changes. You can model this behavior with a conditionalvariance modela dynamicmodel describing the evolution of the process variance,2, conditional Vton past innovations and variances.Set Variance equal to one of the three conditional variance model objects available in EconometricsToolbox (garch, egarch,or gjr). This creates a composite conditional mean and variance model variable.

Choices for the Innovation Distribution

The available distributions for zt are: Standardized Gaussian Standardized Students t with > 2 degrees of freedom,Q2t

zTQ,Qwhere TQ follows a Students t distribution with > 2 degrees of freedom.

The t distribution is useful for modeling time series with more extreme values than expected under aGaussian distribution. Series with larger values than expected under normality are said to have excesskurtosis.Tip It is good practice to assess the distributional properties of model residuals to determine if aGaussian innovation distribution (the default distribution) is appropriate for your data.

Specify the Innovation Distribution

The property Distribution in a model object stores the distribution name (and degrees of freedom for thet distribution). The data type of Distribution is a struct array. For a Gaussian innovation distribution, thedata structure has only one field: Name. For a Students t distribution, the data structure must have twofields: Name,withvalue 't' DoF, with a scalar value larger than two (NaN is the default value)If the innovation distribution is Gaussian, you do not need to assign a value to Distribution. arimacreates the required data structure.To illustrate, consider specifying an MA(2) model with an iid Gaussian innovation process:model = arima(0,0,2)model =ARIMA(0,0,2) Model:------------------Distribution: Name = 'Gaussian'P: 0D: 0Q: 2Constant: NaNAR: {}SAR: {}MA: {NaN NaN} at Lags [1 2] SMA: {}Variance: NaNThe model output shows that Distribution is a struct array with one field, Name,withthevalue 'Gaussian'.When specifying a Students t innovation distribution, you can specify the distribution with eitherunknown or known degrees of freedom. If the degrees of freedom are unknown, you can simply assignDistribution the value 't'. By default, the property Distribution has a data structure with field Name equalto 't', and field DoF equal to NaN. When you input the model object to estimate, the degrees of freedomare estimated along with any other unknown model parameters.For example, specify an MA(2) model with an iid Students t innovation distribution, with unknowndegrees of freedom:model = arima('MALags',1:2,'Distribution','t')model =

ARIMA(0,0,2) Model:------------------Distribution: Name = 't', DoF = NaNP: 0D: 0Q: 2Constant: NaNAR: {}SAR: {}MA: {NaN NaN} at Lags [1 2] SMA: {}Variance: NaNThe output shows that Distribution is a data structure with two fields. Field Name has the value't',andfield DoF has the value NaN.If the degrees of freedom are known, and you want to set an equality constraint, assign a struct array toDistribution with fields Name and DoF. In this case, if the model object is input to estimate, the degreesof freedom wont be estimated (the equality constraint is upheld).SpecifyanMA(2)modelwithaniidStudents t innovation process with eight degrees of freedom:model = arima('MALags',1:2,'Distribution',struct('Name','t','DoF',8))

Modify the Innovation Distribution

After a model object exists in the workspace, you can modify its Distribution property using dotnotation. You cannot modify the fields of the Distribution data structure directly. For example,model.Distribution.DoF = 8 is not a valid assignment. However, you can get the individual fields.

StartwithanMA(2)model:model = arima(0,0,2);To change the distribution of the innovation process in an existing model object to a Students tdistribution with unknown degrees of freedom, type:model.Distribution = 't'To change the distribution to a t distribution with known degrees of freedom, use a data structure:model.Distribution = struct('Name','t','DoF',8)You can get the individual Distribution fields:model.Distribution.DoFans =8To change the innovation distribution from a Students t back to a Gaussian distribution, type:model.Distribution = 'Gaussian'model =ARIMA(0,0,2) Model:------------------Distribution: Name = 'Gaussian'P: 0D: 0Q: 2Constant: NaNAR: {}SAR: {}MA: {NaN NaN} at Lags [1 2] SMA: {}Variance: NaNThe Name field is updated to 'Gaussian',andthereisnolongera DoF field.

References[1] Wold, H. A Study in the Analysis of Stationary Time Series. Uppsala, Sweden: Almqvist and Wiksell,1938.

Theautocorrelation functions show significant serial dependence.Conduct an Engles ARCH test. Test the null hypothesis of no conditional heteroscedasticity against thealternative hypothesis of an ARCH model with two lags (which is locally equivalent to a GARCH(1,1)model).[h,p] = archtest(r-mean(r),'lags',2)h=1p=0The null hypothesis is rejected in favor of the alternative hypothesis (h= 1).Step 4. Specify a conditional mean and variance model.

Specify an AR(1) model for the conditional mean of the NASDAQ returns, and a GARCH(1,1) modelfor the conditional variance. This is a model of the formrc r IH,ttt11whereHVz ,ttt2 112 11 ,VNJV DHtt t

and zt is an independent and identically distributed standardized Gaussian process.

Impulse Response Function

yL () ,tt iti tPH \H P\ H

i 1 (4-19)where \() denotes the infinite-degree lag operator polynomial() LL2 .12\\The coefficients \i are sometimes called dynamic multipliers [1]. You can interpret the coefficient \j asthe change in yt+j due to a one-unit change in t,wytj\j . wHt

Provided the series \i^` is absolutely summable, Equation 4-19 corresponds

to a stationary stochastic process [2]. For a stationary stochastic process, the impact on the process dueto a change in t is not permanent, and the effectof the impulse decays to zero. If the series \i^` is explosive, the process yt

is nonstationary. In this

case, a one-unit change in t permanently affects

the process.

The series \^` describes the change in future values yt+iduetoaone-unit

See Also arima | impulse | LagOp | toCellArray | isStable | cell2mat

Box-Jenkins Differencing vs. ARIMA Estimation

This example shows how to estimate an ARIMA model with nonseasonal integration using estimate.The series is not differenced before estimation. The results are compared to a Box-Jenkins modelingstrategy [1], where the data are first differenced, and then modeled as a stationary ARMA model.

The time series is the log quarterly Australian Consumer Price Index (CPI) measured from 1972 to1991.Step 1. Load the data.

Load and plot the Australian CPI data.

The series is nonstationary, with a clear upward trend. This suggests differencing the data before using astationary model (as suggested by the Box-Jenkins methodology), or fitting a nonstationary ARIMAmodel directly.Step 2. Estimate an ARIMA model.

Take the first difference of the data. Estimate an AR(2) model using the differenced data.dY = diff(Y);modAR = arima(2,0,0);fitAR = estimate(modAR,dY);ARIMA(2,0,0) Model:------------------Conditional Probability Distribution: GaussianStandard t Parameter Value Error Statistic-----------------------------------------Constant 0.0104289 0.00380427 2.74137AR{1} 0.201194 0.101463 1.98293AR{2} 0.32299 0.118035 2.7364Variance 9.42421e-05 1.16259e-05 8.10622The parameter point estimates are very similar to those in Step 2. The standard errors, however, arelarger when the data is differenced before estimation.Forecasts made using the fitted AR model will be on the differenced scale. Forecasts made using theARIMA model in Step 2 will be on the same scale as the original data.

Maximum Likelihood Estimation for Conditional Mean Models

Innovation DistributionFor conditional mean models in Econometrics Toolbox, the form of the innovation process isHVz,

where zt can be standardized Gaussian or Students t with Q! 2 degrees of freedom. Specify yourdistribution choice in the arima model object Distribution property.ttt

The innovation variance,

V2,t

can be a positive scalar constant, or

characterized by a conditional variance model. Specify the form of the conditional variance using theVariance property. If you specify a conditional variance model, the parameters of that model areestimated with the conditional mean model parameters simultaneously.Given a stationary model,yLP\ H() ,tt

applying an inverse filter yields a solution for the innovation Ht

1()(Ly).H\ P

tt

For example, for an AR(p) process,

cLy() ,HI ttwhere() (1LL LII Ip p)is the degree p AR operator polynomial.estimate uses maximum likelihood to estimate the parameters of an arima model. estimate returns fittedvalues for any parameters in the input model object equal to NaN. estimate honors any equalityconstraints in the input model object, and does not return estimates for parameters with equalityconstraints.

Loglikelihood FunctionsGiven the history of a process, innovations are conditionally independent. Let Ht denote the history of aprocess available at time t, t = 1,...,N. The likelihood function for theinnovationseriesisgivenbyN

fHfH(, , , | )NN1 ( | ),HH H12 tt1

t1

where f is a standardized Gaussian or t density function.

The exact form of the loglikelihood objective function depends on the parametric form of the innovationdistribution. If zt has a standard Gaussian distribution, then the loglikelihood function isLLFNlog(2)1

Conditional Mean Model Estimation with Equality Constraints

For conditional mean model estimation, estimate requires an arima model and a vector of univariate timeseries data. The model specifies the parametric form of the conditional mean model that estimateestimates. estimate returns fitted values for any parameters in the input model with NaN values. If youpass aTr exogenous covariate matrix in the X argument, then estimate returns r regression estimates . Ifyou specify nonNaN values for any parameters, estimate views these values as equality constraints andhonors them during estimation.For example, suppose you are estimating a model without a constant term. Specify 'Constant',0 in themodel you pass into estimate. estimate views this nonNaN value as an equality constraint, and does notestimate the constant term. estimate also honors all specified equality constraints while estimatingparameters without equality constraints. You can set a subset of regression coefficients to a constant andestimate the rest. For example, suppose your model is called model. If your model has three exogenouscovariates, and you want to estimate two of them and set the other to one to 5, then specify model.Beta =[NaN 5 NaN].estimate optionally returns the variance-covariance matrix for estimated parameters. The parameterorder in this matrix is:

Presample Data for Conditional Mean Model Estimation

Presample data comes from time points before the beginning of the observation period. In EconometricsToolbox, you can specify your own presample data or use generated presample data.

In a conditional mean model, the distribution of

t is conditional on historical information. Historical information includes past responses, yy y12 t,past innovations,,, , , and, if you include them in the model, pastHH H12

t1

and present exogenous covariates,xx x x12,, , ,tt .

The number of past responses and innovations that a current innovation depends on is determined by thedegree of the AR or MA operators, and any differencing. For example, in an AR(2) model, eachinnovation depends on the two previous responses,HII .tt t t11 2 2In ARIMAX models, the current innovation also depends on the current value of the exogenouscovariate (unlike distributed lag models). For example, in an ARX(2) model with one exogenouscovariate, each innovation depends on the previous two responses and the current value of the covariate,HII .

tt t t t11 2 2

In general, the likelihood contribution of the first few innovations is conditional on historicalinformation that might not be observable. How do you estimate the parameters without all the data? Inthe ARX(2) example, H2 explicitly depends on y1, y0, and x2, and H1 explicitly depends on y0, y1, andx1. Implicitly, H2 depends on x1 and x0, and H depends on x0 and x1. However, you cannot observe y0,y1, x0, and x1.The amount of presample data that you need to initialize a model depends onthedegreeofthemodel.Theproperty P of an arima model specifies the number of presample responses and exogenous data thatyou need to initialize the AR portion of a conditional mean model. For example,P= 2 in an ARX(2)model. Therefore, you need two responses and two data points from each exogenous covariate series toinitialize the model.Oneoptionistousethefirst P data from the response and exogenous covariate series as your presample,and then fit your model to the remaining data. This results in some loss of sample size. If you plan tocompare multiple potential models, be aware that you can only use likelihood-based measures of fit(including the likelihood ratio test and information criteria) to compare models fit to the same data (ofthe same sample size). If you specify your own presample data, then you must use the largest requirednumber of presample responses across all models that you want to compare.The property Q of an arima model specifies the number of presample innovations needed to initialize theMA portion of a conditional mean model. You can get presample innovations by dividing your data intotwo parts. Fit a model to the first part, and infer the innovations. Then, use the inferred innovations aspresample innovations for estimating the second part of the data.For a model with both an autoregressive and moving average component, you can specify bothpresample responses and innovations, one or the other, or neither.By default, estimate generates automatic presample response and innovation data. The software: Generates presample responses by backward forecasting. Sets presample innovations to zero. Does not generate presample exogenous data. One option is to backward forecast each exogenousseries to generate a presample during data preprocessing.

Optimization Settings for Conditional Mean Model Estimation

Options Data Structure

estimate minimizes the negative loglikelihood function using fmincon from Optimization Toolbox.fmincon has many optimization options, such as choice of optimization algorithm and constraintviolation tolerance. Choose optimization options values using optimset.estimate uses the fmincon optimization options by default, with these exceptions. For details, seefmincon and optimset in Optimization Toolbox.optimset Field AlgorithmDisplayDiagnosticsTolConDescription estimate Settings Algorithm for minimizing the 'sqp'negative loglikelihood functionLevel of display for optimization 'iter'progressDisplay for diagnostic information 'on'about the function to be minimizedTermination tolerance on constraint 1e-7violationsIf you want to use optimization options that differ from the default, then set your own using optimset.For example, suppose that you do not want estimate to display optimization nor estimation information.Define an AR(1) model modY and simulate data from it.modY = arima('AR',0.5,'Constant',0,'Variance',1); simY = simulate(modY,25);Create default fmincon optimization options in options. Then, use options to set Algorithm and TolConto their estimate default values. Set Display and Diagnostics to off.options = optimset('fmincon');options = optimset(options,'Algorithm','sqp',... 'TolCon',1e-7,'Display','off','Diagnostics','off');Create an empty AR(1) model for estimation, and fit the model to the simulated data. Specify theoptimization options and set print to false.estmodY = arima(1,0,0);fitY = estimate(estmodY,simY,'options',options,'print',false);Note estimate numerically minimizes the negative loglikelihood function using a specific set ofconstraints. If you set Algorithm to anything other than sqp, then check that the algorithm uses similarconstraints. For example, fmincon sets Algorithm to trust-region-reflective by default. trust-regionreflective uses a different set of constraints than sqp. Therefore, if you do not change the default

fmincon.Algorithm value, then estimate displays warning. During estimation, fmincon temporarily setsAlgorithm to active-set by default to satisfy the constraints required by estimate.estimate sets a constraint level of TolCon so constraints are not violated. Be aware that an estimate withan active constraint has unreliable standard errors since variance-covariance estimation assumes thelikelihood function is locally quadratic around the maximum likelihood estimate.

Estimate Multiplicative ARIMA Model

This example shows how to estimate a multiplicative seasonal ARIMA model using estimate. The timeseries is monthly international airline passenger numbers from 1949 to 1960.Step 1. Load the data and specify the model.

Copy the data file Data_Airline.mat from the folder

\help\toolbox\econ\examples in your MATLAB root, and paste it into your current working folder (or set\help\toolbox\econ\examples as your current working folder).load Data_AirlineY = log(Dataset.PSSG);N = length(Y);model = arima('Constant',0,'D',1,'Seasonality',12,... 'MALags',1,'SMALags',12);For details about model selection and specification for this data, see Specify Multiplicative ARIMAModel on page 4-53.Step 2. Estimate the model using presample data.

Use the first 13 observations as presample data, and the remaining 131 observations for estimation.

with innovation variance 0.0014.

Notice that the model constant is not estimated, but remains fixed at zero. There is no correspondingstandard error or t statistic for the constant term. The row (and column) in the variance-covariancematrix corresponding to the constant term has all zeros.

Forecast Multiplicative ARIMA Model on page 4-177

Model Seasonal Lag Effects Using Indicator Variables

This example shows how to estimate a seasonal ARIMA model: Model the seasonal effects using a multiplicative seasonal model. Use exogenous indicator variables as a regression component for the seasonal effects, called seasonaldummies.Subsequently, their forecasts show that the methods produce similar results. The time series is monthlyinternational airline passenger numbers from 1949 to 1960.Step 1. Load the data.

Navigate to the folder containing sample data, and load the data set Data_Airline.cd(matlabroot)cd('help/toolbox/econ/examples')load Data_Airlinedat = log(Data); % Transform to logarithmic scale T = size(dat,1);y = dat(1:103); % estimation sampley is the part of dat used for estimation, and the rest of dat is the holdout sample to compare the twomodels forecasts. For details about model selection and specification for this data, see SpecifyMultiplicative ARIMA Model on page 4-53.Step 2. Define and fit the model specifying seasonal lags.

MA{1} -0.357316 0.088031 -4.05899

()( )LLy(. )( . ) 12 H, L Ltt

where t is an iid normally distributed series with mean 0 and variance 0.0013.Step 3. Define and fit the model using seasonal dummies.

Create an ARIMAX(0,1,1) model with period 12 seasonal differencing and a regression component,()( )12 cLLyx L1 t.ETH tt{xt; t = 1,...,T} is a series of T column vectors having length 12 that indicate in which month the tthobservation was measured. A 1 in the ith row of xtindicates that the observation was measured in the ith month, therest of

Beta12 0.000988407 0.0142945 0.0691459

Variance 0.00177152 0.000246566 7.18475The fitted model is12

()( )LLyxttc(. ) ,

EHtwhere t is an iid normally distributed series with mean 0 and variance 0.0017 andE is a column vectorwith the values Beta1 Beta12. Note that the estimates MA{1} and Variance between model1 andmodel2 are not equal.Step 4. Forecast using both models.

Forecast IGD Rate Using ARIMAX Model

This example shows how to forecast an ARIMAX model two ways.Step 1. Load the data.

Load the Credit Defaults data set, assign the response IGD to Y and the predictors AGE, CPF, and SPRto the matrix X, and obtain the sample size T. To avoid distraction from the purpose of this example,assume that all exogenous series are stationary.load Data_CreditDefaultsX = Data(:,[1 3:4]);T = size(X,1);Y = Data(:,5);Step 2. Process response and exogenous data.

Divide the response and exogenous data into estimation and holdout series. Assume that each exogenouscovariate series is AR(1), and fit each one to that model. Forecast exogenous data at a 10-year horizon.

Assume that the response series is ARX(1), and fit it to that model including the exogenous covariateseries. Infer the residuals Eest from the fitted response model Yfit.modelY = arima(1,0,0);Yfit = estimate(modelY,Yest,'X',Xest,... 'print',false,'Y0',Y(1));Eest = infer(Yfit,Yest,'Y0',Y(1),'X',Xest);Step 5. MMSE forecast responses.

Estimate Conditional Mean and Variance Models

This example shows how to estimate a composite conditional mean and variance model using estimate.Step 1. Load the data and specify the model.

Load the NASDAQ data included with the toolbox. Convert the daily close composite index series to areturn series. Specify an AR(1) and GARCH(1,1) composite model. This is a model of the formrc r IH,ttt11whereHVz ,ttt2 112 11 ,VNJV DHtt t

and zt is an independent and identically distributed standardized Gaussian process.

The conditionalvariances increase after observation 2000. This corresponds to the increased volatility seen in theoriginal return series.The standardized residuals have more large values (larger than 2 or 3 in absolute value) than expectedunder a standard normal distribution. This suggests a Students t distribution might be more appropriatefor the innovation distribution.Step 4. Fit a model with a t innovation distribution.

Compare the two model fits (Gaussian and t innovation distribution) using the Akaike informationcriterion (AIC) and Bayesian information criterion (BIC). First, obtain the loglikelihood objectivefunction value for the second fit.[resT,VT,LogLT] = infer(fitT,r); [aic,bic] = aicbic([LogL,LogLT],[5,6],N)aic =1.0e+04 *-1.8387 -1.8498bic =1.0e+04 *-1.8357 -1.8462The second model has six parameters compared to five in the first model (because of the t distributiondegrees of freedom). Despite this, both information criteria favor the model with the Students tdistribution. The AIC and BIC values are smaller (more negative) for the t innovation distribution.

Choose ARMA Lags Using BIC

This example shows how to use the Bayesian information criterion (BIC) to select the degrees p and q ofan ARMA model. Estimate several models with different p and q values. For each estimated model,output the loglikelihood objective function value. Input the loglikelihood value to aicbic to calculate theBIC measure of fit (which penalizes for complexity).Step 1. Simulate an ARMA time series.

Plot the sample autocorrelation function (ACF) and partial autocorrelation function (PACF) for thesimulated data.figure(2)subplot(2,1,1) autocorr(Y)subplot(2,1,2) parcorr(Y)Both the sample ACF and PACF decay relatively slowly. This is consistent with an ARMA model. TheARMA lags cannot be selected solely by looking at the ACF and PACF, but it seems no more than fourAR or MA terms are needed.Step 3. Fit ARMA(p,q) models.

To identify the best lags, fit several models with different lag choices. Here, fit all combinations of p =1,...,4 and q = 1,...,4 (a total of 16 models). Store the loglikelihood objective function and number ofcoefficients for each fitted model.LOGL = zeros(4,4); %InitializePQ = zeros(4,4);forp =1:4

Calculate the BIC for each fitted model. The number of parameters in a model is p + q + 1 (for the ARand MA coefficients, and constant term). The number of observations in the data set is 100.LOGL = reshape(LOGL,16,1); PQ = reshape(PQ,16,1);[~,bic] = aicbic(LOGL,PQ+1,100); reshape(bic,4,4)ans =108.6241 105.9489 109.4164 113.844399.1639 101.5886 105.5203 109.4348102.9094 106.0305 107.6489 99.6794107.4045 100.7072 102.5746 102.0209In the output BIC matrix, the rows correspond to the AR degree (p)andthe columns correspond to theMA degree (q). The smallest value is best.The smallest BIC value is 99.1639 in the (2,1) position. This corresponds to an ARMA(2,1) model,matching the model that generated the data.

Infer Residuals for Diagnostic Checking

This example shows how to infer residuals from a fitted ARIMA model. Diagnostic checks areperformed on the residuals to assess model fit.The time series is the log quarterly Australian Consumer Price Index (CPI) measured from 1972 to1991.

Step 1: Load the data.

Load the Australian CPI data. Take first differences, then plot the series.load Data_JAustralianY = Dataset.PAU;N = length(Y);dY = diff(Y);figure(1)plot(2:N,dY)xlim([0,N])title('Differenced Australian CPI')The differenced series looks relatively stationary.Step 2. Plot the sample ACF and PACF.

Specify, and then estimate, an ARIMA(2,1,0) model. Infer the residuals for diagnostic checking.model = arima(2,1,0);fit = estimate(model,Y); [res,~,LogL] = infer(fit,Y);Notice that the model is fit to the original series, and not the differenced series. Themodelobjecttobefit,model, has property D equal to 1.This accounts for the one degree of differencing.This specification assumes a Gaussian innovation distribution. infer returns the value of theloglikelihood objective function (Logl) along with the residuals (res).Step 4. Perform residual diagnostic checks.

The residuals appear uncorrelated and approximately normally distributed. There is some indication thatthere is an excess of large residuals.Step 5. Modify the innovation distribution.

To explore possible excess kurtosis in the innovation process, fit an ARIMA(2,1,0) model with aStudents t distribution to the original series. Return the value of the loglikelihood objective function soyou can use the Bayesian information criterion (BIC) to compare the fit of the two models.model.Distribution = 't';[fitT,~,LogLT] = estimate(model,Y); [~,bic] = aicbic([LogLT,LogL],[5,4],N)bic =-492.5317 -479.4691The model with the t innovation distribution has one extra parameter (the degrees of freedom of the tdistribution).According to the BIC, the ARIMA(2,1,0) model with a Students t innovation distribution is the betterchoice because it has a smaller (more negative) BIC value.

Monte Carlo Simulation of Conditional Mean Models

What Is Monte Carlo Simulation?

Monte Carlo simulation is the process of generating independent, random draws from a specifiedprobabilistic model. When simulating time series models, one draw (or, realization) is an entire samplepath of specified length N, y1, y2,...,yN. When you generate a large number of draws, say M,you generateM sample paths, each of length N.

Monte Carlo Error

Because Monte Carlo estimation is based on a finite number of simulations, Monte Carlo estimates aresubject to some amount of error. You can reduce the amount of Monte Carlo error in your simulationstudy by increasing the number of sample paths, M, that you generate from your model.

For example, to estimate the probability of a future event:

Generate M sample paths from your model.Estimate the probability of the future event using the sample proportion of the event occurrence acrossM simulations,12

# times event occurs in M draws.M

Calculate the Monte Carlo standard error for the estimate,se pp).Mp3

You can reduce the Monte Carlo error of the probability estimate by increasing the number ofrealizations. If you know the desired precision of your estimate, you can solve for the number ofrealizations needed to achieve that level of precision.

Presample Data for Conditional Mean Model Simulation

When simulating realizations from ARIMA processes, you need presample responses and presampleinnovations to initialize the conditional mean model. The number of presample responses you need toinitialize a simulation are stored in the arima model property P. The number of presample innovationsyou need to initialize a simulation are stored in the property Q.Youcan specify your own presample dataor let simulate generate presample data.If you let simulate generate default presample data, then: For stationary processes, simulate sets presample responses to the unconditional mean of the process. For nonstationary processes, simulate sets presample responses to 0. Presample innovations are set to 0.If you specify a matrix of exogenous covariates, then simulate sets the presample responses to 0.

Transient Effects in Conditional Mean Model Simulations

When you use automatically generated presample data, you often see some transient effects at thebeginning of the simulation. This is sometimes called a burn-in period. For stationary processes, theimpulse response function decays to zero over time. This means the starting point of the simulation iseventually forgotten. To reduce transient effects, you can:Oversample: generate sample paths longer than needed, and discard the beginning samples that showtransient effects.Recycle: use a first simulation to generate presample data for a second simulation.For nonstationary processes, the starting point is never forgotten. By default, all realizations ofnonstationary processes begin at zero. For a nonzero starting point, you need to specify your ownpresample data.

The simulation mean is constant over time. This is consistent with the definition of a stationary process.The process variance is not constant over time, however. There are transient effects at the beginning of

the simulation due to the absence of presample data.

To reduce transient effects, one option is to oversample the process. For example, to sample 50observations, you can generate paths with more than 50 observations, and discard all but the last 50observations as burn-in. Here, simulate paths of length 150, and discard the first 100 observations.rng('default')Y = simulate(model,150,'numPaths',1000); Y = Y(101:end,:);figure(3)subplot(2,1,1)plot(Y,'Color',[.85,.85,.85])title('Simulated AR(2) Process')hold onh=plot(mean(Y,2),'k','LineWidth',2);legend(h,'Simulation Mean','Location','NorthWest') hold offsubplot(2,1,2)plot(var(Y,0,2),'r','LineWidth',2)xlim([0,50])title('Process Variance')hold onplot(1:50,.83*ones(50,1),'k--','LineWidth',1.5) legend('Simulation','Theoretical',...

'Location','SouthEast')hold off

The realizations now look like draws from a stationary stochastic process. The simulation variancefluctuates (due to Monte Carlo error) around the theoretical variance.

Simulate an MA ProcessThis example shows how to simulate sample paths from a stationary MA(12) process without specifyingpresample observations.Step 1. Specify a model.

For an MAprocess, the constant term is the unconditional mean. The simulation mean is around 0.5, as expected.Step 3. Plot the simulation variance.

The unconditional variance for the model is

()(..)...2 1222H 1 082 2

TTV u1

Because the model is stationary, the unconditional variance should be constant across all times. Plot thesimulation variance, and compare it to the theoretical variance.figure(2)plot(var(Y,0,2),'Color',[.75,.75,.75],'LineWidth',1.5) xlim([0,60])title('Unconditional Variance')hold onplot(1:60,.336*ones(60,1),'k--','LineWidth',2)legend('Simulation','Theoretical',...

'Location','SouthEast')hold off

There appears to be a short burn-in period at the beginning of the simulation. During this time, thesimulation variance is lower than expected. Afterwards, the simulation variance fluctuates around thetheoretical variance.Step 4. Generate more sample paths.

Simulate 10,000 paths from the model, each with length 1000. Look at the simulation variance.rng('default')YM = simulate(model,1000,'numPaths',10000);figure(3)plot(var(YM,0,2),'Color',[.75,.75,.75],'LineWidth',1.5) ylim([0.3,0.36])title('Unconditional Variance')hold onplot(1:1000,.336*ones(1000,1),'k--','LineWidth',2) legend('Simulation','Theoretical',...'Location','SouthEast')hold off

The sample paths

fluctuate around the theoretical trend line with constant variance. The simulation mean is close to the

true trend line.

Step 2. Generate realizations from a difference-stationary process.

Specify the difference-stationary model

'y05tt t t

.. .,HH H11where the innovation distribution is Gaussian with variance 8. After specifying the model, simulate 50sample paths of length 200. No burn-in is needed because all sample paths should begin at zero. This isthe simulate default starting point for nonstationary processes with no presample data.model = arima('Constant',0.5,'D',1,'MA',{1.4,0.8},... 'Variance',8);rng('default')Yd = simulate(model,200,'numPaths',50);figure(2)plot(Yd,'Color',[.85,.85,.85])hold onh1=plot(t,trend,'r','LineWidth',5);xlim([0,200])title('Difference-Stationary Process') h2=plot(mean(Yd,2),'k--','LineWidth',2);legend([h1,h2],'Trend','Simulation Mean',... 'Location','NorthWest')hold off

The simulationaverage is close to the trend line with slope 0.5. The variance of the sample paths grows over time.Step 3. Difference the sample paths.

A difference-stationary process is stationary when differenced appropriately. Take the first differences ofthe sample paths from the difference-stationary process, and plot the differenced series. One observationis lost as a result of the differencing.diffY = diff(Yd,1,1);figure(3)plot(2:200,diffY,'Color',[.85,.85,.85])xlim([0,200])title('Differenced Series')hold onh = plot(2:200,mean(diffY,2),'k--','LineWidth',2); legend(h,'Simulation Mean','Location','NorthWest')hold off

The differencedseries looks stationary, with the simulation mean fluctuating around zero.

Simulate Multiplicative ARIMA Models

This example shows how to simulate sample paths from a multiplicative seasonal ARIMA model usingsimulate. The time series is monthly international airline passenger numbers from 1949 to 1960.Step 1. Load the data and estimate a model.

Copy the data file Data_Airline.mat from the folder

\help\toolbox\econ\examples in your MATLAB root, and paste it into your current working folder (or set\help\toolbox\econ\examples as your current working folder).

The simulatedforecasts show growth and seasonal periodicity similar to the observed series.Step 3. Estimate the probability of a future event.

Use simulations to estimate the probability that log airline passenger counts will meet or exceed thevalue 7 sometime during the next 5 years. Calculate the Monte Carlo error associated with the estimatedprobability.rng('default')Ysim = simulate(fit,60,'numPaths',1000,'Y0',Y,'E0',res); g7 = sum(Ysim >= 7) > 0;phat = mean(g7)err = sqrt(phat*(1-phat)/1000)phat =0.3910err =0.0154There is approximately a 39% chance that the (log) number of airline passengers will meet or exceed 7in the next 5 years. The Monte Carlo standard error of the estimate is about 0.02.Step 4. Plot the distribution of passengers at a future time.

Use the simulations to plot the distribution of (log) airline passenger counts 60 months into the future.figure(2)hist(Ysim(60,:))title('Distribution of Passenger Counts in 60 months')

Standardize the innovations using the square root of the conditional variance process. Plot thestandardized innovations over the forecast horizon.figure(3)plot(E./sqrt(V))xlim([0,1000])title('Simulated Standardized Innovations')The fitted model assumes the standardized innovations follow a standardized Students t distribution.Thus, the simulated innovations have more larger values than would be expected from a Gaussianinnovation distribution.

Monte Carlo Forecasting of Conditional Mean Models

Monte Carlo ForecastsYou can use Monte Carlo simulation to forecast a process over a future time horizon. This is analternative to minimum mean square error (MMSE) forecasting, which provides an analytical forecastsolution. You can calculate MMSE forecasts using forecast.

To forecast a process using Monte Carlo simulations:

Fit a model to your observed series using estimate. Use the observed series and any inferred residuals and conditional variances (calculated using infer)for presample data. Generate many sample paths over the desired forecast horizon using simulate.

Advantage of Monte Carlo Forecasting

An advantage of Monte Carlo forecasting is that you obtain a complete distribution for future events, notjust a point estimate and standard error. The simulation mean approximates the MMSE forecast. Use the2.5th and 97.5th percentiles of the simulation realizations as endpoints for approximate 95% forecastintervals.

MMSE Forecasting of Conditional Mean Models

In this section...What are MMSE Forecasts? on page 4-167 How forecast Generates MMSE Forecasts on page4-167 Forecast Error on page 4-169

What are MMSE Forecasts?

A common objective of time series modeling is generating forecasts for a process over a future timehorizon. That is, given an observed series y1, y2

,...,yN

and a forecast horizon

h, generate predictions foryy y,,, .

NN Nh

Let yt1 denote a forecast for the process at time t + 1, conditional on the history of the process up totime t, Ht, and the exogenous covariate series up to time t+1, Xt+1, if a regression component is includedin the model.The minimum mean square error (MMSE) forecast is the forecast yt1 that minimizes expected squareloss,Ey(|,).21

Minimizing this loss function yields the MMSE forecast,

(|, ). 11 1 tttt

How forecast Generates MMSE Forecasts

The forecast method generates MMSE forecasts recursively. When you call forecast, you can specifypresample observations (Y0), innovations (E0), conditional variances (V0), and exogenous covariatedata (X0)using name-value arguments. If you include presample exogenous covariate data, then youmust also specify exogenous covariate forecasts (XF). To begin forecasting from the end of an observedseries, say Y,usethelastfew observations of Y as presample responses Y0 to initialize the forecast. Thereare several points to keep in mind when you specify presample data: The minimum number of responses needed to initialize forecasting is stored in the property P of anarima model. If you provide too few presample observations, forecast returns an error. Ifyoudonotprovideanypresampleresponses,then forecast assigns default values:- For models that are stationary and do not contain a regression component, all presample observationsare set to the unconditional mean of the process.- For nonstationary models or models with a regression component, all presample observations are set tozero. IfyouforecastamodelwithanMAcomponent,then forecast requires presample innovations. The numberof innovations needed is stored in the property Q of an arima model. If you also have a conditionalvariance model, you must additionally account for any presample innovations it requires. If you specifypresample innovations, but not enough, forecast returns an error. If you forecast a model with a regression component, then forecast requires presample exogenouscovariate data. The number of presample exogenous covariate data needed is at least the number ofpresample responses minus P. If you provide presample exogenous covariate data, but not enough, thenforecast returns an error. If you do not specify any presample innovations, but specify sufficient presample responses (at least P+ Q) and exogenous covariate data (at least the number of presample responses minus P), then forecast

Given presample observations yN1 and yN, forecasts are recursively generated as follows:

ycy yNNN11 21ycy y

NNN

ycy yIINNN 31221

For a stationary AR process, this recursion converges to the unconditional mean of the process,c.P II12

For an MA(12) process, e.g.,

y PH TH TH

12,tt t t...

you need 12 presample innovations to initialize the forecasts. All innovations from time N + 1 andgreater are set to their expectation, zero. Thus, for anMA(12)process,theforecastforanytimemorethan12stepsinthefutureis the unconditional mean, .

Consider a conditional mean model given by

yx Lc () ,tP\Htt

where()

LLL2

. Sum the variances of the lagged innovations

\\\

to get the s-step MSE,

where (),221 2\\2

\ Vs122H

denotes the innovation variance. VH

For stationary processes, the coefficients of the infinite lag operator polynomial are absolutelysummable, and the MSE converges to the unconditional variance of the process.For nonstationary processes, the series does not converge, and the forecast error grows over time.

Convergence of AR ForecastsThis example shows how to forecast a stationary AR(12) process using forecast. Evaluate the asymptoticconvergence of the forecasts, and compare forecasts made with and without using presample data.Step 1. Specify an AR(12) model.

Specify the model

yy y Htt tt.. ,

where the innovations are Gaussian with variance 2. Generate a realization of length 300 from theprocess. Discard the first 250 observations as burn-in.model = arima('Constant',3,'AR',{0.7,0.25},'ARLags',[1,12],... 'Variance',2);rng('default')Y = simulate(model,300); Y = Y(251:300);

Step 3. Calculate the asymptotic variance.

The MSE of the process converges to the unconditional variance of the

2

process ( V2 ). You can calculate the variance using the impulse responseHfunction. The impulse response function is based on the infinite-degree MA representation of the AR(2)process.The last few values of YMSE show the convergence toward the unconditional variance.ARpol = LagOp({1,-.7,-.25},'Lags',[0,1,12]); IRF = cell2mat(toCellArray(1/ARpol)); sig2e = 2;variance = sum(IRF.^2)*sig2evariance =7.9938YMSE(145:end)ans =7.88707.88997.89267.89547.89807.9006

Convergence is not reached within 150 steps, but the forecast MSE is approaching the theoreticalunconditional variance.Step 4. Forecast without using presample data.

Repeat the forecasting without using any presample data.

See AlsoRelated Examplesfigure(3)plot(Y,'Color',[.75,.75,.75])hold onplot(51:200,Yf2,'r','LineWidth',2)plot(51:200,[upper2,lower2],'k--','LineWidth',1.5) xlim([0,200])hold off

The convergence of the forecast MSE is the same without using presample data. However, all MMSEforecasts are the unconditional mean. This is because forecast initializes the AR model with theunconditional mean when you do not provide presample data.arima | forecast | LagOp | simulate | toCellArray | Simulate Stationary Processes on page 4-144 Forecast Multiplicative ARIMA Model on page 4-177

Concepts MMSE Forecasting of Conditional Mean Models on page 4-167

Autoregressive Model on page 4-18

Forecast Multiplicative ARIMA Model

This example shows how to forecast a multiplicative seasonal ARIMA model using forecast. The timeseries is monthly international airline passenger numbers from 1949 to 1960.Step 1. Load the data and estimate a model.

Copy the data file Data_Airline.mat from the folder

\help\toolbox\econ\examples in your MATLAB root, and paste it into your current working folder (or set\help\toolbox\econ\examples as your current working folder).

The MMSE forecast shows airline passenger counts continuing to grow over the forecast horizon. Theconfidence bounds show that a decline in passenger counts is plausible, however. Because this is anonstationary process, the width of the forecast intervals grows over time.Step 3. Compare MMSE and Monte Carlo forecasts.

The conditional variance forecasts converge to the asymptotic variance of the GARCH conditionalvariance model. The forecasted returns converge to the estimated model constant (the unconditionalmean of the AR conditional mean model).

Parametric Regression Analysis

In this section...What Is Parametric Regression? on page 9-2 Choose a Regression Function on page 9-2Update Legacy Code with New Fitting Methods on page 9-3

What Is Parametric Regression?

Regression is the process of fitting models to data. The models must have numerical responses. Formodels with categorical responses, see Parametric Classification on page 14-2 or SupervisedLearning (Machine Learning) Workflow and Algorithms on page 15-2. The regression process dependson the model. If a model is parametric, regression estimates the parameters from the data. If a model islinear in the parameters, estimation is based on methods from linear algebra that minimize the norm of aresidual vector. If a model is nonlinear in the parameters, estimation is based on search methods fromoptimization that minimize the norm of a residual vector.

Update Legacy Code with New Fitting Methods

There are several Statistics Toolbox functions for performing regression. The following sectionsdescribe how to replace calls to older functions to new versions: regress into LinearModel.fit on page 9-4 regstats into LinearModel.fit on page 9-4 robustfit into LinearModel.fit on page 9-5 stepwisefit into LinearModel.stepwise on page 9-5 glmfit into GeneralizedLinearModel.fit on page 9-6 nlinfit into NonLinearModel.fit on page 9-6regress into LinearModel.fitPrevious Syntax. [b,bint,r,rint,stats] = regress(y,X) where X contains a column of ones.Current Syntax. mdl = LinearModel.fit(X,y) where you do not add a column of ones to X.Equivalent values of the previous outputs: b mdl.Coefficients.Estimate bint coefCI(mdl) r mdl.Residuals.Raw rint There is no exact equivalent. Try examiningmdl.Residuals.Studentized to find outliers. stats mdl contains various properties that replace components of stats.regstats into LinearModel.fitPrevious Syntax.stats = regstats(y,X,model,whichstats)Current Syntax. mdl = LinearModel.fit(X,y,model)

Obtain statistics from the properties and methods of mdl. For example, see the mdl.Diagnostics andmdl.Residuals properties.robustfit into LinearModel.fitPrevious Syntax.[b,stats] = robustfit(X,y,wfun,tune,const)Current Syntax.mdl = LinearModel.fit(X,y,'robust','on') % bisquareOr to use the wfun weight and the tune tuning parameter:opt.RobustWgtFun = ' wfun';opt.Tune = tune; % optionalmdl = LinearModel.fit(X,y,'robust',opt)Obtain statistics from the properties and methods of mdl. For example, see the mdl.Diagnostics andmdl.Residuals properties.stepwisefit into LinearModel.stepwisePrevious Syntax.[b,se,pval,inmodel,stats,nextstep,history] = stepwisefit(X,y,Name,Value)Current Syntax.mdl = LinearModel.stepwise(ds,modelspec,Name,Value)ormdl = LinearModel.stepwise(X,y,modelspec,Name,Value)Obtain statistics from the properties and methods of mdl. For example, see the mdl.Diagnostics andmdl.Residuals properties.glmfit into GeneralizedLinearModel.fitPrevious Syntax.[b,dev,stats] = glmfit(X,y,distr,param1,val1,...)Current Syntax.mdl = GeneralizedLinearModel.fit(X,y,distr,...)Obtain statistics from the properties and methods of mdl. For example, the deviance is mdl.Deviance,and to compare mdl against a constant model, use devianceTest(mdl).nlinfit into NonLinearModel.fitPrevious Syntax.[beta,r,J,COVB,mse] = nlinfit(X,y,fun,beta0,options)Current Syntax.mdl = NonLinearModel.fit(X,y,fun,beta0,'Options',options)Equivalent values of the previous outputs: beta mdl.Coefficients.Estimate r mdl.Residuals.Raw covb mdl.CoefficientCovariance mse mdl.mse

mdl does not provide the Jacobian (J) output. The primary purpose of J was to pass it into nlparci ornlpredci to obtain confidence intervals for the estimated coefficients (parameters) or predictions. Obtainthose confidence intervals as:parci = coefCI(mdl)[pred,predci] = predict(mdl)

What Are Linear Regression Models?

Regression models describe the relationship between a dependent variable, y,and independent variableor variables, X. The dependent variable is also called the response variable. Independent variables arealso called explanatory or predictor variables. Continuous predictor variables might be calledcovariates, whereas categorical predictor variables might be also referred to as factors. The matrix, X, ofobservations on predictor variables is usually called the design matrix.A multiple linear regression model is011 22yXX X in,,,,iii pipi

whereyi is the ith response. k is the kth coefficient, where0 is the constant term in the model. Sometimes, design matrices mightinclude information about the constant term. However, LinearModel.fit or LinearModel.stepwise bydefault includes a constant term in the model, so you must not enter a column of 1s into your designmatrix X.Xij is the ith observation on the jth predictor variable, j = 1, ..., p. i is the ith noise term, that is, random error.In general, a linear regression model can be a model of the formK

yfXXXin,,, , ,,,i ikkiiip012

k1

where f (.) is a scalar-valued function of the independent variables, Xijs. The functions, f (X), might be inany form including nonlinear functions or polynomials. The linearity, in the linear regression models,refers to the linearity of the coefficients k. That is, the response variable, y,isalinear function of thecoefficients, k.Some examples of linear models are:yXXXi iiiiEE E E HyXXXXiiiii011 22 31 3 422 HiEE E E E HyXXXX X

iiiii ii

The following, however, are not linear models since they are not linear in the unknown coefficients, k.

logyXXEE E Hiiii1 EXX

yXEE 31 2ii H 011 22

ii

The usual assumptions for linear regression models are:

The noise terms, i, are uncorrelated. The noise terms, i, have independent and identical normal distributions with mean zero and constantvariance,2.Thus

K Ey E

EHi

k 0 K

Ekk ik0K

Ekk ik0

and Vy VK2

ii EHHVk 0 So the variance of yi is the same for all levels of Xij. The responses yi are uncorrelated.The fitted linear function is K

yb bf X X X i n,,, , ,,,ikkiiip

k1

where yi is the estimated response and bks are the fitted coefficients. The coefficients are estimated so asto minimize the mean squared difference between the prediction vector bf(X) and the true responsevector y, that

is yy. This method is called the method of least squares.Underthe assumptions on the noise terms, thesecoefficients also maximize the likelihood of the prediction vector.In a linear regression model of the form y =1X1 +2X2+... + pXp,the coefficient k expresses the impact ofa one-unit change in predictor variable, Xj, on the mean of the response, E(y) provided that all othervariables are held constant. The sign of the coefficient gives the direction of the effect. For example, ifthe linear model is E(y) = 1.8 2.35X1 + X2, then 2.35 indicates a 2.35 unit decrease in the meanresponse with a one-unit increase in X1,given X2 is held constant. If the model is E(y)=1.1+1.5X12 + X2,the coefficient of X12 indicates a 1.5 unit increase in the mean of Y with a one-unit increase in X12 givenall else held constant. However, in the case of E(y)=1.1+2.1X1 +1.5X12, it is difficult to interpret thecoefficients similarly, since it is not possible to hold X1 constant when X12 changes or vice versa.

Linear RegressionIn this section...Prepare Data on page 9-11Choose a Fitting Method on page 9-13Choose a Model or Range of Models on page 9-14 Fit Model to Data on page 9-20Examine Quality and Adjust the Fitted Model on page 9-20 Predict or Simulate Responses to NewData on page 9-39 Share Fitted Models on page 9-42Linear Regression Workflow on page 9-43

Prepare DataTo begin fitting a regression, put your data into a form that fitting functions expect. All regressiontechniques begin with input data in an array X and response data in a separate vector y, or input data in a

dataset array ds and response data as a column in ds. Each row of the input data represents oneobservation. Each column represents one predictor (variable).For a dataset array ds, indicate the response variable with the 'ResponseVar' name-value pair:mdl = LinearModel.fit(ds,'ResponseVar','BloodPressure'); %ormdl = GeneralizedLinearModel.fit(ds,'ResponseVar','BloodPressure');The response variable is the last column by default.You can use numeric categorical predictors. A categorical predictor is one that takes values from a fixedset of possibilities. For a numeric array X, indicate the categorical predictors using the 'Categorical' name-value pair. Forexample, to indicate that predictors 2 and 3 out of six are categorical:mdl = LinearModel.fit(X,y,'Categorical',[2,3]);%ormdl = GeneralizedLinearModel.fit(X,y,'Categorical',[2,3]); % or equivalentlymdl = LinearModel.fit(X,y,'Categorical',logical([0 1 1 0 0 0])); For a dataset array ds, fitting functions assume that these data types are categorical:- Logical- Categorical (nominal or ordinal)- String or character arrayIf you want to indicate that a numeric predictor is categorical, use the 'Categorical' name-value pair.Represent missing numeric data as NaN. To represent missing data for other data types, see MissingGroup Values on page 2-53.Dataset Array for Input and Response Data For example, to create a dataset array from an Excelspreadsheet:ds = dataset('XLSFile','hospital.xls',... 'ReadObsNames',true);To create a dataset array from workspace variables:load carsmallds = dataset(MPG,Weight); ds.Year = ordinal(Model_Year);NumericMatrix for Input Data, Numeric Vector for Response For example, to create numeric arraysfrom workspace variables:load carsmallX = [Weight Horsepower Cylinders Model_Year]; y = MPG;To create numeric arrays from an Excel spreadsheet:[X Xnames] = xlsread('hospital.xls'); y = X(:,4); % response y is systolic pressure X(:,4) = []; % removey from the X matrix

Notice that the nonnumeric entries, such as sex, do not appear in X.

Choose a Fitting Method

Therearethreewaystofitamodeltodata: Least-Squares Fit on page 9-13 RobustFitonpage9-13 Stepwise Fit on page 9-13Least-Squares FitUse LinearModel.fit to construct a least-squares fit of a model to the data. This method is best when youare reasonably certain of the models form, and mainly need to find its parameters. This method is alsouseful when you want to explore a few models. The method requires you to examine the data manuallyto discard outliers, though there are techniques to help (see Residuals Model Quality for TrainingData on page 9-24).Robust FitUse LinearModel.fit with the RobustOpts name-value pair to create a model that is little affected byoutliers. Robust fitting saves you the trouble of manually discarding outliers. However, step does notwork with robust fitting. This means that when you use robust fitting, you cannot search stepwise for agood model.Stepwise FitUse LinearModel.stepwise to find a model, and fit parameters to the model. LinearModel.stepwise startsfrom one model, such as a constant, and adds or subtracts terms one at a time, choosing an optimal termeach time in a greedy fashion, until it cannot improve further. Use stepwise fitting to find a good model,which is one that has only relevant terms.The result depends on the starting model. Usually, starting with a constant model leads to a small model.Starting with more terms can lead to a more complex model, but one that has lower mean squared error.See Compare large and small stepwise models on page 9-111.You cannot use robust options along with stepwise fitting. So after a stepwise fit, examine your modelfor outliers (see Residuals Model Quality for Training Data on page 9-24).

Choose a Model or Range of Models

There are several ways of specifying a model for linear regression. Use whichever you find mostconvenient. Brief String on page 9-15 Terms Matrix on page 9-15 Formula on page 9-19For LinearModel.fit, the model specification you give is the model that is fit. If you do not give a modelspecification, the default is 'linear'.

For LinearModel.stepwise, the model specification you give is the starting model, which the stepwiseprocedure tries to improve. If you do not give a model specification, the default starting model is'constant',andthedefault upper bounding model is 'interactions'. Change the upper bounding model usingthe Upper name-value pair.Note Thee are other ways of selecting models, such as using lasso, lassoglm, sequentialfs,or plsregress.Brief StringString'constant' 'linear''interactions''purequadratic''quadratic''polyijk'Model TypeModel contains only a constant (intercept) term.Model contains an intercept and linear terms for each predictor.Model contains an intercept, linear terms, and all products of pairs of distinct predictors (no squaredterms).Model contains an intercept, linear terms, and squared terms.Model contains an intercept, linear terms, interactions, and squared terms.Model is a polynomial with all terms up to degree i in the first predictor, degree j in the second predictor,etc. Use numerals 0 through 9.For example, 'poly2111' has a constant plus all linear and product terms,and also contains terms with predictor 1 squared.For example, to specify an interaction model using LinearModel.fit with matrix predictors:mdl = LinearModel.fit(X,y,'interactions');To specify a model using LinearModel.stepwise and a dataset array ds of predictors, suppose you wantto start from a constant and have a linear model upper bound. Assume the response variable in ds is inthe third column.mdl2 = LinearModel.stepwise(ds,'constant',... 'Upper','linear','ResponseVar',3);Terms MatrixA terms matrix is a T-byP+1 matrix specifying terms in a model, where T is the number of terms, P isthe number of predictor variables, and plus one is for the response variable. The value of T(i,j) is theexponent of variable j in term i. For example, if there are three predictor variables A, B,and C:[0 0 0 0] % constant term or intercept [0 1 0 0] % B; equivalently, A^0 * B^1 * C^0 [1 0 1 0] % A*C[2 0 0 0] % A^2[0 1 2 0] % B*(C^2)

The 0 at the end of each term represents the response variable. In general, If you have the variables in a dataset array, then a 0 must represent the response variable depending onthe position of the response variable in the dataset array. For example:Load sample data and define the dataset array.load hospitalds = dataset(hospital.Sex,hospital.BloodPressure(:,1),hospital.Age,...hospital.Smoker,'VarNames',{'Sex','BloodPressure','Age','Smoker'});Represent the linear model 'BloodPressure ~ 1 + Sex + Age + Smoker' in a terms matrix. The responsevariable is in the second column of the data set array, so there must be a column of zeros for theresponse variable in the second column of the term matrix.T= [0 00 0;1 00 0;0 0 1 0;00 01]T=000 0100 0001 0000 1Redefine the dataset array.ds = dataset(hospital.BloodPressure(:,1),hospital.Sex,hospital.Age,...hospital.Smoker,'VarNames',{'BloodPressure','Sex','Age','Smoker'});Now, the response variable is the first term in the data set array. Specify the same linear model,'BloodPressure ~ 1 + Sex + Age + Smoker', using a term matrix.T= [0 00 0;0 10 0;0 0 1 0;00 01]T=000 0010 0001 0000 1 If you have the predictor and response variables in a matrix and column vector, then you must includea 0 for the response variable at the end of each term. For example:Load sample data and define the matrix of predictors.load carsmallX = [Acceleration,Weight];Specify the model 'MPG ~ Acceleration + Weight +Acceleration:Weight + Weight^2' using a term matrix and fit the model to data. This model includes the

A formula for a model specification is a string of the form

'Y ~ terms', Y is the response name. terms contains- Variable names- + to include the next variable- to exclude the next variable- : to define an interaction, a product of terms- * to define an interaction and all lower-order terms- ^toraisethepredictortoapower,exactlyasin * repeated, so ^ includes lower order terms as well- () to group termsTip Formulas include a constant (intercept) term by default. To exclude a constant term from the model,include -1 in the formula.Examples:'Y ~A +B+ C' is a three-variable linear model with intercept. 'Y ~A +B+ C- 1' is a three-variable linearmodel without intercept. 'Y ~A +B+ C+ B^2' is a three-variable model with intercept and a B^2 term.'Y ~A +B^2 +C' is the same as the previous example, since B^2 includes a B term.'Y ~A +B+ C+ A:B' includes an A*B term.'Y ~A*B+C' is the same as the previous example, sinceA*B= A+ B + A:B.'Y ~ A*B*C - A:B:C' has all interactions among A, B,and C,exceptthe three-way interaction.'Y ~ A*(B + C + D)' has all linear terms, plus products of A with each of the other variables.For example, to specify an interaction model using LinearModel.fit with matrix predictors:mdl = LinearModel.fit(X,y,'y ~ x1*x2*x3 - x1:x2:x3');To specify a model using LinearModel.stepwise and a dataset array ds of predictors, suppose you wantto start from a constant and have a linear model upper bound. Assume the response variable in ds isnamed 'y', and the predictor variables are named 'x1', 'x2',and 'x3'.mdl2 = LinearModel.stepwise(ds,'y ~ 1','Upper','y ~ x1 + x2 + x3');

Fit Model to Data

The most common optional arguments for fitting: For robust regression in LinearModel.fit,setthe 'RobustOpts' name-value pair to 'on'. Specify an appropriate upper bound model in LinearModel.stepwise, such as set 'Upper' to 'linear'. Indicate which variables are categorical using the 'CategoricalVars' name-value pair. Provide a vectorwith column numbers, such as [1 6] to specify that predictors 1 and 6 are categorical. Alternatively, givea logical vector the same length as the data columns, with a 1 entry indicating that variable iscategorical. If there are seven predictors, and predictors 1 and 6 are categorical, specifylogical([1,0,0,0,0,1,0]).

For a dataset array, specify the response variable using the 'ResponseVar' name-value pair. The defaultis the last column in the array.For example,mdl = LinearModel.fit(X,y,'linear',... 'RobustOpts','on','CategoricalVars',3);mdl2 = LinearModel.stepwise(ds,'constant',... 'ResponseVar','MPG','Upper','quadratic');

There is a standard error column for the coefficient estimates.

The reported pValue (which are derived from the t statistics under the assumption of normal errors) forpredictors 1, 3, and 5 are extremely small. These are the three predictors that were used to create theresponse data y. The pValue for (Intercept), x2 and x4 are much larger than 0.01. These three predictors were not usedto create the response data y. The display contains R2,adjusted R2,and F statistics.ANOVATo examine the quality of the fitted model, consult an ANOVA table. For example, use anova on a linearmodel with five predictors:X = randn(100,5);y = X*[1;0;3;0;-1]+randn(100,1); mdl = LinearModel.fit(X,y); tbl = anova(mdl)tbl =SumSq DF MeanSq F pValue x1 106.62 1 106.62 112.96 8.5494e-18 x2 0.53464 1 0.53464 0.566460.45355 x3 793.74 1 793.74 840.98 1.1117e-48 x4 0.16515 1 0.16515 0.17498 0.67667 x5 67.398 167.398 71.41 3.593e-13 Error 88.719 94 0.94382This table gives somewhat different results than the default display (see Model Display on page 9-21).The table clearly shows that the effects of x2 and x4 are not significant. Depending on your goals,consider removing x2 and x4 from the model.Diagnostic PlotsDiagnostic plots help you identify outliers, and see other problems in your model or fit. For example,load the carsmall data, and make a model of MPG as a function of Cylinders (nominal) and Weight:load carsmallds = dataset(Weight,MPG,Cylinders);ds.Cylinders = ordinal(ds.Cylinders);mdl = LinearModel.fit(ds,'MPG ~ Cylinders*Weight + Weight^2');Make a leverage plot of the data and model.plotDiagnostics(mdl)

There are a few points with

high leverage. But this plot does not reveal whether the high-leverage points are outliers.Look for points with large Cooks distance.plotDiagnostics(mdl,'cookd')

There is one point with large Cooks distance. Identify it and remove it from the model. You can use theData Cursor to click the outlier and identify it, or identify it programmatically:[~,larg] = max(mdl.Diagnostics.CooksDistance);mdl2 = LinearModel.fit(ds,'MPG ~ Cylinders*Weight + Weight^2',... 'Exclude',larg);

Residuals Model Quality for Training Data

There are several residual plots to help you discover errors, outliers, or correlations in the model or data.The simplest residual plots are the default histogram plot, which shows the range of the residuals andtheir frequencies, and the probability plot, which shows how the distribution of the residuals comparesto a normal distribution with matched variance.Load the carsmall data, and make a model of MPG as a function of Cylinders (nominal) and Weight:load carsmallds = dataset(Weight,MPG,Cylinders);ds.Cylinders = ordinal(ds.Cylinders);mdl = LinearModel.fit(ds,'MPG ~ Cylinders*Weight + Weight^2');Examine the residuals:plotResiduals(mdl)

The observations above

12 are potential outliers.

plotResiduals(mdl,'probability')

The two potential outliers appear on this plot as well. Otherwise, the probability plot seems reasonablystraight, meaning a reasonable fit to normally distributed residuals.You can identify the two outliers and remove them from the data:outl = find(mdl.Residuals.Raw > 12)outl =9097To remove the outliers, use the Exclude name-value pair:mdl2 = LinearModel.fit(ds,'MPG ~ Cylinders*Weight + Weight^2',... 'Exclude',outl);Examine a residuals plot of mdl2:

plotResiduals(mdl2)

The new residuals plot looks fairly symmetric, without obvious problems. However, there might besome serial correlation among the residuals. Create a new plot to see if such an effect exists.plotResiduals(mdl2,'lagged')

The scatter plot shows many more crosses in the upper-right and lower-left quadrants than in the othertwo quadrants, indicating positive serial correlation among the residuals.Another potential issue is when residuals are large for large observations. See if the current model hasthis issue.plotResiduals(mdl2,'fitted')There is some tendency for larger fitted values to have larger residuals. Perhaps the model errors areproportional to the measured values.Plots to Understand Predictor EffectsThis example shows how to understand the effect each predictor has on a regression model using avariety of available plots.1

You can drag the individual predictor values, which are represented by dashed blue vertical lines. Youcan also choose between simultaneous and non-simultaneous confidence bounds, which are represented

by dashed red curves.

3 Use an effects plot to show another view of the effect of predictors on theresponse.plotEffects(mdl)

This plot shows that changing Weight from about 2500 to 4732 lowersMPGbyabout30(thelocationoftheupperbluecircle). Italsoshowsthat changing the number of cylindersfrom 8 to 4 raises MPG by about 10 (the lower blue circle). The horizontal blue lines representconfidence intervals for these predictions. The predictions come from averaging over one predictor asthe other is changed. In cases such as this, where the two predictors are correlated, be careful wheninterpreting the results.Instead of viewing the effect of averaging over a predictor as the other is changed, examine the jointinteraction in an interaction plot.4

plotInteraction(mdl,'Weight','Cylinders')

The interaction plot shows the effect of changing one predictor with the other held fixed. In this case, theplot is much more informative. It shows, for example, that lowering the number of cylinders in arelatively light car (Weight = 1795) leads to an increase in mileage, but lowering the number ofcylinders in a relatively heavy car (Weight = 4732) leads to a decrease in mileage.For an even more detailed look at the interactions, look at an interaction plot with predictions. Thisplot holds one predictor fixed while varying the other, and plots the effect as a curve. Look at theinteractions for various fixed numbers of cylinders.5

plotInteraction(mdl,'Cylinders','Weight','predictions')

Now look at the

interactions with various fixed levels of weight.plotInteraction(mdl,'Weight','Cylinders','predictions')Plots to Understand Terms EffectsThis example shows how to understand the effect of each term in a regression model using a variety ofavailable plots.1

This plot shows the results of fitting both Weight^2 and MPG to the terms other than Weight^2. Thereason to use plotAdded is to understand what additional improvement in the model you get by addingWeight^2.The coefficient of a line fit to these points is the coefficient of Weight^2 in the full model. TheWeight^2 predictor is just over the edge of significance (pValue < 0.05) as you can see in thecoefficients table display. You can see that in the plot as well. The confidence bounds look like theycould not contain a horizontal line (constant y), so a zero-slope model is not consistent with the data.Create an added variable plot for the model as a whole.plotAdded(mdl)3

mdl2 uses just Displacement and Horsepower, and has nearly as good a fit to the data as mdl1 in theAdjusted R-Squared metric.

Predict or Simulate Responses to New Data

There are three ways to use a linear model to predict or simulate the response to new data: predict on page 9-39 feval on page 9-40 random on page 9-41predictThis example shows how to predict and obtain confidence intervals on the predictions using the predictmethod.Load the carbig data and make a default linear model of the response MPG to the Acceleration,Displacement, Horsepower,and Weight predictors.1

load carbigX = [Acceleration,Displacement,Horsepower,Weight]; mdl = LinearModel.fit(X,MPG);Create a three-row array of predictors from the minimal, mean, and maximal values. There are someNaN values, so use functions that ignore NaN values.2

Xnew = [nanmin(X);nanmean(X);nanmax(X)]; % new data 3 Find the predicted model responses andconfidence intervals on the predictions.[NewMPG NewMPGCI] = predict(mdl,Xnew)NewMPG =34.134523.4078NewMPGCI =31.6115 36.657522.9859 23.8298 0.6134 8.9367The confidence bound on the mean response is narrower than those for the minimum or maximumresponses, which is quite sensible.fevalWhen you construct a model from a dataset array, feval is often more convenient for predicting meanresponses than predict. However, feval does not provide confidence bounds.

This example shows how to predict mean responses using the feval method.1 Load the carbig data and make a default linear model of the response MPG to the Acceleration,Displacement, Horsepower,and Weight predictors.load carbigds = dataset(Acceleration,Displacement,Horsepower,Weight,MPG); mdl =LinearModel.fit(ds,'linear','ResponseVar','MPG');Create a three-row array of predictors from the minimal, mean, and maximal values. There are someNaN values, so use functions that ignore NaN values.2

% new dataThe Xnew array is not a dataset array. It has the right number of columns for prediction, so feval can useit for predictions.3 Find the predicted model responses.NewMPG = feval(mdl,Xnew)NewMPG =34.134523.4078randomThe random method simulates new random response values, equal to the mean prediction plus a randomdisturbance with the same variance as the training data.This example shows how to simulate responses using the random method.1 Load the carbig data and make a default linear model of the response MPG to the Acceleration,Displacement, Horsepower,and Weight predictors.load carbigX = [Acceleration,Displacement,Horsepower,Weight]; mdl = LinearModel.fit(X,MPG);Create a three-row array of predictors from the minimal, mean, and maximal values. There are someNaN values, so use functions that ignore NaN values.2

step took two steps. This means it could not improve the model further by adding or subtracting a singleterm.Plot the effectiveness of the simpler model on the training data.plotResiduals(mdl1)

The residuals look about as

small as those of the original model.Step 5. Predict responses to new data.

Suppose you have four new people, aged 25, 30, 40, and 65, and the first and third smoke. Predict theirsystolic pressure using mdl1.ages = [25;30;40;65];smoker = {'Yes';'No';'Yes';'No'}; systolicnew = feval(mdl1,ages,smoker)systolicnew =127.8561118.3412129.4734122.1149To make predictions, you need only the variables that mdl1 uses.Step 6. Share the model.

You might want others to be able to use your model for prediction. Access the terms in the linear model.coefnames = mdl1.CoefficientNamescoefnames ='(Intercept)' 'age' 'smoke_Yes'View the model formula.mdl1.Formulaans =

This time, put the response variable in the first column of the dataset array.ds = dataset(X(:,15),X(:,7),X(:,8),X(:,9),'Varnames',... {'price','curb_weight','engine_size','bore'});When the response variable is in the first column of ds, define its location. For example,LinearModel.fit, by default, assumes that bore is the response variable. You can define the responsevariable in the model using either:

where P is price, C is curb weight, E is engine size, B is bore, i is the coefficient for the correspondingterm in the model, and is the error term. The final model includes all three main effects, the interactioneffects for curb weight and engine size and engine size and bore, and the second-order term for curbweight.

BP is the blood pressure

is are the coefficientsISm is the indicator variable for smoking; ISm = 1 indicates a smoking patient whereas ISm = 0 indicatesa nonsmoking patientIS is the indicator variable for sex; IS = 1 indicates a male patient whereas IS = 0 indicates a femalepatient XA is the Age variableXW is the Weight variable is the error termThe following table shows the fitted linear model for each gender and smoking combination.I

SmIS

Linear Model 1 (Male) 1 (Smoker)

BP0 EE E E E EAXAW XSm

BP 107 5617.. .A 0 11826 X

W 1 (Male) 0 (Nonsmoker)

BP0EE E EAXA WXW

0 (Female) 1 (Smoker)BP 143 0007.. . 0 1393X

BP0 EE E E E XSAA W SWW

BP 97 901.. .0 11826XAW

0 (Female) 0 (Nonsmoker) BPX0EE EAA W WXBP 133 17.. .0 1393X AW

As seen from these models, Sm and S show how much the intercept of the response function changeswhen the indicator variable takes the value 1 compared to when it takes the value 0. SW, however, showsthe effect of the Weight variable on the response variable when the indicator variable for sex takes thevalue 1 compared to when it takes the value 0. You can explore the main and interaction effects in thefinal model using the methods of the LinearModelclassasfollows.Plot prediction slice plots.

figure()plotSlice(mdl)

This plot shows the main effects for all predictor variables. The green line in each panel shows thechange in the response variable as a function of the predictor variable when all other predictor variablesare held constant. For example, for a smoking male patient aged 37.5, the expected blood pressureincreases as the weight of the patient increases, given all else the same.The dashed red curves in each panel show the 95% confidence bounds for the predicted response values.The horizontal dashed blue line in each panel shows the predicted response for the specific value of thepredictor variable corresponding to the vertical dashed blue line. You can drag these lines to get thepredicted response values at other predictor values, as shown next.

For example, the predicted value of the response variable is 118.3497 when a patient is female,nonsmoking, age 40.3788, and weighs 139.9545 pounds. The values in the square brackets, [114.621,122.079], show the lower and upper limits of a 95% confidence interval for the estimated response. Notethat, for a nonsmoking female patient, the expected blood pressure decreases as the weight increases,given all else is held constant.Plot main effects.

figure()

plotEffects(mdl)

This plot displays the main effects. The circles show the magnitude of the effect and the blue lines showthe upper and lower confidence limits for the main effect. For example, being a smoker increases theexpected blood pressure by 10 units, compared to being a nonsmoker, given all else is held constant.Expected blood pressure increases about two units for malescompared to females, again, given other predictors held constant. An increase in age from 25 to 50causes an expected increase of 4 units, whereas a change in weight from 111 to 202 causes about a4-unit decrease in the expected blood pressure, given all else held constant.Plot interaction effects.

figure()

plotInteraction(mdl,'Sex','Weight')

This plot displays the impact of a change in one factor given the other factor is fixed at a value.Note Be cautious while interpreting the interaction effects. When there is not enough data on all factorcombinations or the data is highly correlated, it might be difficult to determine the interaction effect ofchanging one factor while keeping the other fixed. In such cases, the estimated interaction effect is anextrapolation from the data.The blue circles show the main effect of a specific term, as in the main effects plot. The red circles showthe impact of a change in one term for fixed values of the other term. For example, in the bottom half ofthis plot, the red circles show the impact of a weight change in female and male patients, separately. Youcan see that an increase in a females weight from 111 to 202 pounds causes about a 14-unit decrease inthe expected blood pressure, while an increase of the same amount in the weight of a male patient causesabout a 5-unit increase in the expected blood pressure, again given other predictors are held constant.Plot prediction effects.

figure()

plotInteraction(mdl,'Sex','Weight','predictions')

This plot shows the effect of changing one variable as the other predictor variable is held constant. Inthis example, the last figure shows the response variable, blood pressure, as a function of weight, whenthe variable sex is fixed at males and females. The lines for males and females are crossing whichindicates a strong interaction between weight and sex. You can see that the expected blood pressureincreases as the weight of a male patient increases, but decreases as the weight of a female patientincreases.

Estimated Coefficients:Estimate SE tStat pValue (Intercept) 47.977 3.8785 12.37 4.8957e-21 x1 -0.0065416 0.0011274 -5.80239.8742e-08 x2 -0.042943 0.024313 -1.7663 0.08078 x3 -0.011583 0.19333 -0.059913 0.95236Number of observations: 93, Error degrees of freedom: 89 Root Mean Squared Error: 4.09R-squared: 0.752, Adjusted R-Squared 0.744F-statistic vs. constant model: 90, p-value = 7.38e-27This linear regression outputs display shows the following.y ~1+ x1 + x2 +x3Linear regression model in the formula form using Wilkinson notation. Here it corresponds to:yXXX.EE E E HFirst column(under Estimated Coefficients)EstimateSE tStatpValueNumber ofobservationsError degrees of freedomRoot meansquared errorTerms included in the model.Coefficient estimates for each corresponding term in the model. For example, the estimate for theconstant term (intercept) is 47.977.Standard error of the coefficients.t -statistic for each coefficient to test the null hypothesis that the corresponding coefficient is zeroagainst the alternative that it is different from zero, given the other predictors in the model. Note thattStat = Estimate/SE. For example, the t-statistic for the intercept is 47.977/3.8785 = 12.37.p -value for the F statistic of the hypotheses test that the corresponding coefficient is equal to zero ornot. For example, the p-value of the F-statistic for x2 is greater than 0.05, so this term is not significantat the 5% significance level given the other terms in the model.Number of rows without any NaN values. For example, Number of observations is 93 because the MPGdata vector has 6 NaN values and one of the data vectors, Horsepower, has one NaN value for a differentobservation.n p,where n is the number of observations, and p is the number of coefficients in the model, includingthe intercept. For example, the model has four predictors, so the Error degrees of freedomis934=89.Square root of the mean squared error, which estimates the standard deviation of the error distribution.

R-squaredand Adjusted R-squaredF-statistic vs. constant modelp-valueCoefficient of determination and adjusted coefficient of determination, respectively. For example, the Rsquared value suggests that the model explains approximately 75% of the variability in the responsevariable MPG.Test statistic for the F-test on the regression model. It tests for a significant linear regression relationshipbetween the response variable and the predictor variables.p-value for the F-test on the model. For example, the model is significant with a p-value of 7.3816e-27.You can request this display by using disp. For example, if you name your model lm, then you candisplay the outputs using disp(lm).Perform analysisofvariance (ANOVA) for the model.

anova(lm,'summary')ans =SumSq DF MeanSq F pValue Total 6004.8 92 65.269Model 4516 3 1505.3 89.987 7.3816e-27 Residual 1488.8 89 16.728This ANOVA display shows the following. SumSq Sum of squares for the regression model, Model, theerror term, Residual, and the total, Total.DF Degrees of freedom for each term. Degrees of freedom is n1 for the total, p 1 for the model, and n p for the error term, where n is the number of observations, and p is the number of coefficients in themodel, including the intercept. For example, MPG data vector has six NaN values and one of the datavectors, Horsepower,hasone NaN value for a different observation, so the total degrees of freedom is 93 1 = 92. There are four coefficients in the model, so the model DFis41=3,andthe DF for errortermis934=89.MeanSq Mean squared error for each term. Note that MeanSq = SumSq/DF. For example, the meansquared error for the error term is 1488.8/89 = 16.728. The square root of this value is the root meansquared error in the linear regression display, or 4.09.F F-statistic value, which is the same as F-statistic vs. constant model in the linear regression display. Inthis example, it is 89.987, and in the linear regression display this F-statistic value is rounded up to 90.pValue p-value for the F-test on the model. In this example, it is 7.3816e-27.Note If there are higher-order terms in the regression model, anova partitions the model SumSq into thepart explained by the higher-order terms and the rest. The corresponding F-statistics are for testing thesignificance of the linear terms and higher-order terms as separate groups.If the data includes replicates, or multiple measurements at the same predictor values, then the anovapartitions the error SumSq into the part for the replicates and the rest. The corresponding F-statistic is

for testing the lack-of-fit by comparing the model residuals with the model-free variance estimatecomputed on the replicates.See the anova method for details.Decompose ANOVA table for model terms.

anova(lm)ans =SumSq DF MeanSq F pValue x1 563.18 1 563.18 33.667 9.8742e-08x2 52.187 1 52.187 3.1197 0.08078x3 0.060046 1 0.060046 0.0035895 0.95236Error 1488.8 89 16.728This anova display shows the following:First Terms included in the model.columnSumSq Sum of squared error for each term except for the constant.DF Degrees of freedom. In this example, DF is 1 for each term in the model and n p for the error term,where n is the number of observations, and p is the number of coefficients in the model, including theintercept. For example, the DF for the errorterminthismodelis934=89.If any of the variables in the model is a categorical variable, the DF for that variable is the number ofindicator variables created for its categories (number of categories 1).MeanSq Mean squared error for each term. Note that MeanSq = SumSq/DF. For example, the meansquared error for the error term is 1488.8/89 = 16.728.F F-values for each coefficient. The F-value is the ratio of the mean squared of each term and meansquared error, that is, F = MeanSq(xi)/MeanSq(Error). Each F-statistic has an F distribution, with thenumerator degrees of freedom, DF value for the corresponding term, and the denominator degrees offreedom, n p. n is the number of observations, and p is the number of coefficients in the model. In thisexample, each F-statistic has an F(1, 89) distribution.pValue p-value for each hypothesis test on the coefficient of the corresponding term in the linear model.For example, the p-value for the F-statistic coefficient of x2 is 0.08078, and is not significant at the 5%significance level given the other terms in the model.Display coefficient confidence intervals.

coefCI(lm)ans =40.2702 55.6833-0.0088 -0.0043-0.0913 0.0054-0.3957 0.3726Thevaluesineachrowarethelowerandupperconfidencelimits,respectively, for the default 95% confidenceintervals for the coefficients. For example, the first row shows the lower and upper limits, 40.2702 and

55.6833, for the intercept,0. Likewise, the second row shows the limits for1 and so on. Confidenceintervals provide a measure of precision for linear regression coefficient estimates. A 100(1)%confidence interval gives the range the corresponding regression coefficient will be in with 100(1)%confidence.You can also change the confidence level. Find the 99% confidence intervals for the coefficients.coefCI(lm,0.01)ans = 37.7677 58.1858-0.0095 -0.0036-0.1069 0.0211-0.5205 0.4973Perform hypothesis test on coefficients.

Test the null hypothesis that all predictor variable coefficients are equal to zero versus the alternatehypothesis that at least one of them is different from zero.[p,F,d] = coefTest(lm)p=7.3816e-27F=89.9874d=3Here, coefTest performs an F-test for the hypothesis that all regression coefficients (except for theintercept) are zero versus at least one differs from zero, which essentially is the hypothesis on the model.It returns p,the p-value, F, the F-statistic, and d, the numerator degrees of freedom. The F-statistic and pvalue are the same as the ones in the linear regression display and ANOVA for the model. The degreesof freedom is 4 1 = 3 because there are four predictors (including the intercept) in the model.Now, perform a hypothesis test on the coefficients of the first and second predictor variables.H= [0 10 0; 00 10]; [p,F,d] = coefTest(lm,H)p=5.1702e-23F=96.4873d=2The numerator degrees of freedom is the number of coefficients tested, which is 2 in this example. Theresults indicate that at least one of2 and3 differs from zero.

Cooks DistancePurposeCooks distance is useful for identifying outliers in the X values (observationsfor predictor variables). It also shows the influence of each observation on the fitted response values. Anobservation with Cooks distance larger than three times the mean Cooks distance might be an outlier.DefinitionCooks distance is the scaled change in fitted values. Each element in CooksDistance is the normalizedchange in the vector of coefficients due to the deletion of an observation. The Cooks distance, Di, ofobservation i isn 2

Djjij1i

pMSE,where yj is the jth fitted response value.

y () is the jth fitted response value, where the fit does not includeobservation i.MSE is the mean squared error.p is the number of coefficients in the regression model.Cooks distance is algebraically equivalent to the following expression:rD2h,iiii

pMSE 12

where ri is the ith residual, and hii is the ith leverage value. CooksDistance is an n-by-1 column vector inthe Diagnostics dataset array of the LinearModel object.How ToAfter obtaining a fitted model, say, mdl,using LinearModel.fit or LinearModel.stepwise, you can: Display the Cooks distance values by indexing into the property using dot notation,mdl.Diagnostics.CooksDistance Plot the Cooks distance values usingplotDiagnostics(mdl,'cookd')For details, see the plotDiagnostics method of the LinearModel class.ExampleLoad the sample data and define the independent and response variables.load hospitalX = double(hospital(:,2:5)); y = hospital.BloodPressure(:,1);Fit the linear regression model.mdl = LinearModel.fit(X,y);Plot the Cooks distance values.plotDiagnostics(mdl,'cookd')

The dashed line in the figure corresponds to the recommended threshold value,3*mean(mdl.Diagnostics.CooksDistance). The plot has someobservations with Cooks distance values greater than the threshold value, which for this example is

3*(0.0108) = 0.0324. In particular, there are two Cooks distance values that are relatively higher thanthe others, which exceed the threshold value. You might want to find and omit these from your data andrebuild your model.Find the observations with Cooks distance values that exceed the threshold value.find((mdl.Diagnostics.CooksDistance)>3*mean(mdl.Diagnostics.CooksDistance))ans =2132844587071849395Find the observations with Cooks distance values that are relatively larger than the other observationswith Cooks distances exceeding the threshold value.find((mdl.Diagnostics.CooksDistance)>5*mean(mdl.Diagnostics.CooksDistance))ans =2 84 Return to Summary of Measures.

Coefficient Confidence Intervals

PurposeThe coefficient confidence intervals provide a measure of precision for linear regression coefficientestimates. A 100(1)% confidence interval gives the range that the corresponding regression coefficientwill be in with 100(1)% confidence.DefinitionThe 100*(1)% confidence intervals for linear regression coefficients are i

bt12/,npSEb,rwhere bi is the coefficient estimate, SE(bi) is the standard error of the coefficient estimate, and t(1/2,np)is the 100(1/2) percentile of t-distribution with n p degrees of freedom. n is the number ofobservations and p is the number of regression coefficients.

How ToAfter obtaining a fitted model, say, mdl,using LinearModel.fit or LinearModel.stepwise, you can obtainthe default 95% confidence intervals for coefficients usingcoefCI(mdl)You can also change the confidence level usingcoefCI(mdl,alpha)For details, see the coefCI and coefTest methods of LinearModel class.ExampleLoad the sample data and fit a linear regression model.load haldmdl = LinearModel.fit(ingredients,heat);Display the 95% coefficient confidence intervals.coefCI(mdl)ans =-99.1786 223.9893-0.1663 3.2685-1.1589 2.1792-1.6385 1.8423-1.7791 1.4910Thevaluesineachrowarethelowerandupperconfidencelimits,respectively, for the default 95% confidenceintervals for the coefficients. For example, the first row shows the lower and upper limits, 99.1786 and223.9893, for the intercept,0. Likewise, the second row shows the limits for1 and so on. Display the 90%confidence intervals for the coefficients ( = 0.1).coefCI(mdl,0.1)ans =-67.8949 192.70570.1662 2.9360-0.8358 1.8561-1.3015 1.5053-1.4626 1.1745The confidence interval limits become narrower as the confidence level decreases.Return to Summary of Measures.

Coefficient of Determination (R-Squared)

PurposeCoefficient of determination (R-squared) indicates the proportionate amount of variation in the responsevariable y explained by the independent variables X in the linear regression model. The larger the Rsquared is, the more variability is explained by the linear regression model.DefinitionR-squared is the proportion of the total sum of squares explained by the model. Rsquared, a property ofthe fitted model, is a structure with two fields: Ordinary Ordinary (unadjusted) R-squared

SSR1 SSE.SST SST

Adjusted R-squared adjusted for the number of coefficientsR2 1 n1SSE.adjnpSSTR2

SSE is the sum of squared error, SSR is the sum of squared regression, SST is the sum of squared total, nis the number of observations, and p is the number of regression coefficients (including the intercept).Because R-squared increases with added predictor variables in the regression model, the adjusted Rsquared adjusts for the number of predictor variables in the model. This makes it more useful forcomparing models with a different number of predictors.How ToAfter obtaining a fitted model, say, mdl,using LinearModel.fit or LinearModel.stepwise, you can obtaineither R-squared value as a scalar by indexing into the property using dot notation, for example,mdl.Rsquared.Ordinary mdl.Rsquared.AdjustedYou can also obtain the SSE, SSR, and SST using the properties with the same name.mdl.SSE mdl.SSR mdl.SSTExampleLoad the sample data and define the response and independent variables.load hospitaly = hospital.BloodPressure(:,1); X = double(hospital(:,2:5));Fit a linear regression model.mdl = LinearModel.fit(X,y)mdl =Linear regression model: y~ 1+ x1 +x2+ x3 +x4Estimated Coefficients:Estimate SE tStat pValue (Intercept) 117.4 5.2451 22.383 1.1667e-39 x1 0.88162 2.9473 0.299130.76549 x2 0.08602 0.06731 1.278 0.20438 x3 -0.016685 0.055714 -0.29947 0.76524 x4 9.884 1.04069.498 1.9546e-15Number of observations: 100, Error degrees of freedom: 95 Root Mean Squared Error: 4.81R-squared: 0.508, Adjusted R-Squared 0.487F-statistic vs. constant model: 24.5, p-value = 5.99e-14The R-squared and adjusted R-squared values are 0.508 and 0.487, respectively. Model explains about50% of the variability in the response variable.Access the R-squared and adjusted R-squared values using the property of the fitted LinearModel object.mdl.Rsquared.Ordinaryans =0.5078mdl.Rsquared.Adjusted

Delete-1 Change in Covariance (covratio)

PurposeDelete-1 change in covariance (covratio) identifies the observations that are influential in the regressionfit. An influential observation is one where its exclusion from the model might significantly alter theregression function. Values of covratio larger than 1 + 3*p/n or smaller than 1 3*p/n indicateinfluential points, where p is the number of regression coefficients, and n is the number of observations.DefinitionThe covratio statistic is the ratio of the determinant of the coefficient covariance matrix with observationi deleted to the determinant of the covariance matrix for the full model:

det MSE iXc

covratio^`.det

MSE X Xc1

CovRatio is an n-by-1 vector in the Diagnostics dataset array of the fitted LinearModel object. Eachelement is the ratio of the generalized variance of the estimated coefficients when the correspondingelement is deleted to the generalized variance of the coefficients using all the data.How ToAfter obtaining a fitted model, say, mdl,using LinearModel.fit or LinearModel.stepwise, you can: Display the CovRatio by indexing into the property using dot notationmdl.Diagnostics.CovRatio Plot the delete-1 change in covariance usingplotDiagnostics(mdl,'CovRatio')For details, see the plotDiagnostics method of the LinearModel class.ExampleLoad the sample data and define the response and predictor variables.load hospitaly = hospital.BloodPressure(:,1); X = double(hospital(:,2:5));

Fit a linear regression model.

For this example, the threshold limits are 1 + 3*5/100 = 1.15 and13*5/100 = 0.85. There are a fewpoints beyond the limits, which might be influential points.Find the observations that are beyond the limits.find((mdl.Diagnostics.CovRatio)>1.15|(mdl.Diagnostics.CovRatio)<0.85)ans =214849396Return to Summary of Measures.

Delete-1 Scaled Difference in Coefficient Estimates (Dfbetas)

PurposeThe sign of a delete-1 scaled difference in coefficient estimate (Dfbetas) for coefficient j and observationi indicates whether that observation causes an increase or decrease in the estimate of the regression

coefficient. The absolute value of a Dfbetas indicates the magnitude of the difference relative to theestimated standard deviation of the regression coefficient. A Dfbetas value larger than 3/sqrt(n) inabsolute value indicates that the observation has a large influence on the corresponding coefficient.DefinitionDfbetas for coefficient j and observation i is the ratio of the difference in theestimateofcoefficient j usingall observations and the one obtained by removing observation i, and the standard error of the coefficientestimate obtained by removing observation i. The Dfbetas for coefficient j and observation i isjbbji

ij

Dfbetas

MSE 1,

where bj is the estimate for coefficient j, bj(i) is the estimate for coefficient j by removing observation i,MSE(i) is the mean squared error of the regression fitbyremovingobservation i,and hii is the leveragevalue for observation i. Dfbetas is an n-byp matrix in the Diagnostics dataset array of the fittedLinearModel object. Each cell of Dfbetas corresponds to the Dfbetas value for the correspondingcoefficient obtained by removing the corresponding observation.How ToAfter obtaining a fitted model, say, mdl,using LinearModel.fit or LinearModel.stepwise, you can obtainthe Dfbetas values as an n-byp matrix by indexing into the property using dot notation,mdl.Diagnostics.DfbetasExampleLoad the sample data and define the response and independent variables.load hospitaly = hospital.BloodPressure(:,1); X = double(hospital(:,2:5));Fit a linear regression model.mdl = LinearModel.fit(X,y);Find the Dfbetas values that are high in absolute value.[row,col] = find(abs(mdl.Diagnostics.Dfbetas)>3/sqrt(100)); disp([row col])2128 184 193 12213 384 32484 4Return to Summary of Measures.

Delete-1 Scaled Change inFitted Values (Dffits)

PurposeThe delete-1 scaled change in fitted values (Dffits) show the influence of each observation on the fittedresponse values. Dffits values with an absolute value larger than 2*sqrt(p/n) might be influential.DefinitionDffits for observation i isDffitsiisr 1

hii ,hiiwhere sriis the studentized residual, and hii is the leverage value of the fitted LinearModel object. Dffitsis an n-by-1 column vector in the Diagnostics dataset array of the fitted LinearModel object. Eachelement in Dffits is the change in the fitted value caused by deleting the corresponding observation andscaling by the standard error.How ToAfter obtaining a fitted model, say, mdl,using LinearModel.fit or LinearModel.stepwise, you can: Display the Dffits values by indexing into the property using dot notationmdl.Diagnostics.Dffits Plot the delete-1 scaled change in fitted values usingplotDiagnostics(mdl,'Dffits')For details, see the plotDiagnostics method of the LinearModel class for details.ExampleLoad the sample data and define the response and independent variables.load hospital y = hospital.BloodPressure(:,1); X = double(hospital(:,2:5));Fit a linear regression model.mdl = LinearModel.fit(X,y);Plot the Dffits values.

plotDiagnostics(mdl,'Dffits')

The influential threshold limit for the absolute value of Dffits in this example is 2*sqrt(5/100) = 0.45.Again, there are some observations with Dffits values beyond the recommended limits.Find the Dffits values that are large in absolute value.find(abs(mdl.Diagnostics.Dffits)>2*sqrt(4/100))ans =2132844587071849395Return to Summary of Measures.

Delete-1 Variance (S2_i)

PurposeThe delete-1 variance (S2_i) shows how the mean squared error changes when an observation isremoved from the data set. You can compare the S2_i values with the value of the mean squared error.DefinitionS2_i is a set of residual variance estimates obtained by deleting each observation in turn. The S2_i valuefor observation i isn2

yyji ji

Si MSE_, np1 where yj is the jth observed response value. S2_i is an n-by-1 vector in the Diagnosticsdataset array of the fitted LinearModel object. Each element in S2_i is the mean squared error of theregression obtained by deleting that observation.How ToAfter obtaining a fitted model, say, mdl,using LinearModel.fit or LinearModel.stepwise, you can: Display the S2_i vector by indexing into the property using dot notationmdl.Diagnostics.S2_i Plot the delete-1 variance values usingplotDiagnostics(mdl,'S2_i') For details, see the plotDiagnostics method of the LinearModel class.ExampleLoad the sample data and define the response and independent variables.load hospitaly = hospital.BloodPressure(:,1); X = double(hospital(:,2:5));Fit a linear regression model.mdl = LinearModel.fit(X,y);Display the MSE value for the model.mdl.MSEans =23.1140Plot the S2_i values.

plotDiagnostics(mdl,'S2_i')

This plot makes it easy to compare the S2_i values to the MSE value of 23.114, indicated by thehorizontal dashed lines. You can see how deleting one observation changes the error variance.Return to Summary of Measures.

Here, ri is the ith raw residual, and n is the number of observations.

How ToAfter obtaining a fitted model, say, mdl,using LinearModel.fit or LinearModel.stepwise, you canperform the Durbin-Watson test using

dwtest(mdl)For details, see the dwtest method of the LinearModel class.ExampleLoad the sample data and fit a linear regression model.load haldmdl = LinearModel.fit(ingredients,heat);Perform a two-sided Durbin-Watson test to determine if there is any autocorrelation among the residualsof the linear model, mdl.[p,DW] = dwtest(mdl,'exact','both')p=0.6285DW =2.0526The value of the Durbin-Watson test statistic is 2.0526. The p-value of 0.6285 suggest that the residualsare not autocorrelated.

F-statisticPurposeIn linear regression, the F-statistic is the test statistic for the analysis of variance (ANOVA) approach totest the significance of the model or the components in the model.DefinitionThe F-statistic in the linear model output display is the test statistic for testing the statistical significanceof the model. The F-statistic values in the anova display are for assessing the significance of the terms orcomponents in the model.How ToAfter obtaining a fitted model, say, mdl,using LinearModel.fit or LinearModel.stepwise, you can: Find the F-statistic vs. constant model in the output display or by usingdisp(mdl) Display the ANOVA for the model usinganova(mdl,'summary') Obtain the F-statistic values for the components, except for the constant term usinganova(mdl)For details, see the anova method of the LinearModel class.ExampleLoad the sample data.load carbigds = dataset(Acceleration,Cylinders,Weight,MPG); ds.Cylinders = ordinal(Cylinders);Fit a linear regression model.mdl = LinearModel.fit(ds,'MPG~Acceleration*Weight+Cylinders+Weight^2')mdl =

This display decomposes the ANOVA table into the model terms. The corresponding F-statistics in the Fcolumn are for assessing the statistical significance of each term. The F-test for Cylinders test whether atleast one of the coefficients of indicator variables for cylinders categories is different from zero or not.That is, whether different numbers of cylinders have a significant effect on MPG or not. The degrees offreedom for each model term is the numerator degrees of freedom for the corresponding F-test. Most ofthe terms have 1 degree of freedom, but the degrees of freedom for Cylinders is 4. Because there arefour indicator variables for this term.Return to Summary of Measures.

Hat MatrixPurposeThe hat matrix provides a measure of leverage. It is useful for investigating whether one or moreobservations are outlying with regard to their X values, and therefore might be excessively influencingthe regression results.DefinitionThe hat matrix is also known as the projection matrix because it projects the vector of observations, y,onto the vector of predictions, y , thus putting the "hat" on y. The hat matrix H is defined in terms of thedata matrix X:H = X(XTX)1XTand determines the fitted or predicted values sinceyHy Xb.The diagonal elements of H, hii, are called leverages and satisfy 01hiiddn

iihp,i1

where p is the number of coefficients, and n is the number of observations (rows of X) in the regressionmodel. HatMatrix is an n-byn matrix in the Diagnostics dataset array.How ToAfter obtaining a fitted model, say, mdl,using LinearModel.fit or LinearModel.stepwise, you can: Display the HatMatrix by indexing into the property using dot notationmdl.Diagnostics.HatMatrixWhen n is large, HatMatrix might be computationally expensive. In those cases, you can obtain thediagonal values directly, usingmdl.Diagnostics.Leverage Return to Summary of Measures.

LeveragePurposeLeverage is a measure of the effect of a particular observation on the regression predictions due to the

position of that observation in the space of the inputs. In general, the farther a point is from the center ofthe input space, the more leverage it has. Because the sum of the leverage values is p, an observation ican be considered as an outlier if its leverage substantially exceeds the mean leverage value, p/n, forexample, a value larger than 2*p/n.DefinitionThe leverage of observation i is the value of the ith diagonal term, hii,ofthe hat matrix, H,whereH = X(XTX)1XT.The diagonal terms satisfy01 iiddn

iihp,i1

where p is the number of coefficients in the regression model, and n is the number of observations. Theminimum value of hii is 1/n for a model with a constant term. If the fitted model goes through the origin,then the minimum leverage value is 0 for an observation at x=0.It is possible to express the fitted values, y , by the observed values, y,sinceyHy Xb.Hence, hii expresses how much the observation yi has impact on yi.Alarge value of hii indicates that theith case is distant from the center of all X values for all n cases and has more leverage. Leverage is an nby-1 column vector in the Diagnostics dataset array.How ToAfter obtaining a fitted model, say, mdl,using LinearModel.fit or LinearModel.stepwise, you can: Display the Leverage vector by indexing into the property using dot notationmdl.Diagnostics.Leverage Plot the leverage for the values fitted by your model usingplotDiagnostics(mdl)See the plotDiagnostics method of the LinearModel class for details.ExampleLoad the sample data and define the response and independent variables.load hospitaly = hospital.BloodPressure(:,1); X = double(hospital(:,2:5));Fit a linear regression model.mdl = LinearModel.fit(X,y);Plot the leverage values.plotDiagnostics(mdl)

For this example,

the recommended threshold value is 2*5/100 = 0.1. There is no indication of high leverage observations.Return to Summary of Measures.

ResidualsPurposeResiduals are useful for detecting outlying y values and checking the linear regression assumptions withrespect to the error term in the regression model. High-leverage observations have smaller residualsbecause they often shift the regression line or surface closer to them. You can also use residuals to detectsome forms of heteroscedasticity and autocorrelation.DefinitionThe Residuals matrix is an n-by-4 dataset array containing a table of four types of residuals, with onerow for each observation.Raw Residuals. Observed minus fitted values, that is,ry y .ii iPearson Residuals. Raw residuals divided by the root mean squared error, that is,prrii,MSEwhere ri is the raw residual and MSE is the mean squared error.Standardized Residuals. Standardized residuals are raw residuals divided by their estimated standarddeviation. The standardized residual for observation i is

sti

ri ,

MSE 1where MSE is the mean squared error and hii is the leverage value for observation i.Studentized Residuals. Studentized residuals are the raw residuals divided by an independent estimateof the residual standard deviation. The residual for observationiisdividedbyanestimateoftheerrorstandarddeviation based on all observations except for observation i.sri

The histogramshows that the residuals are slightly right skewed.Plot the box plot of all four types of residuals.Res = double(mdl.Residuals);

You can see the

right-skewed structure of the residuals in the box plot as well.Plot the normal probability plot of the raw residuals.plotResiduals(mdl,'probability')boxplot(Res)

This normalprobability plot also shows the deviation from normality and the skewness on the right tail of thedistribution of residuals.Plot the residuals versus lagged residuals.

plotResiduals(mdl,'lagged')

This graph shows a trend, which indicates a possible correlation among the residuals. You can furthercheck this using dwtest(mdl). Serial correlation among residuals usually means that the model can beimproved.Plot the symmetry plot of residuals.plotResiduals(mdl,'symmetry')

This plot also

suggests that the residuals are not distributed equally around their median, as would be expected fornormal distribution.Plot the residuals versus the fitted values.plotResiduals(mdl,'fitted')

The increase inthe variance as the fitted values increase suggests possible heteroscedasticity.Return to Summary of Measures.

t-statisticPurposeIn linear regression, the t-statistic is useful for making inferences about the regression coefficients. Thehypothesis test on coefficient i tests the null hypothesis that it is equal to zero meaning thecorresponding term is not significant versus the alternate hypothesis that the coefficient is differentfrom zero.DefinitionFor a hypotheses test on coefficient i,withH0 : i=0H1 : i?0,the t-statistic is:tbiSE b (),iwhere SE(bi) is the standard error of the estimated coefficient bi.How ToAfter obtaining a fitted model, say, mdl,using LinearModel.fit or LinearModel.stepwise, you can:

Stepwise RegressionIn this section...Stepwise Regression to Select Appropriate Models on page 9-111 Compare large and small stepwisemodels on page 9-111

Stepwise Regression to Select Appropriate Models

LinearModel.stepwise creates a linear model and automatically adds to or trims the model. To create asmall model, start from a constant model. To create a large model, start with a model containing manyterms. A large model usually has lower error as measured by the fit to the original data, but might nothave any advantage in predicting new data.LinearModel.stepwise can use all the name-value options from LinearModel.fit, with additional optionsrelating to the starting and bounding models. In particular: For a small model, start with the default lower bounding model: 'constant' (a model that has nopredictor terms). The default upper bounding model has linear terms and interaction terms (products of pairs ofpredictors). For an upper bounding model that also includes squared terms, set the Upper name-valuepair to 'quadratic'.

Notice that: mdl1 has four coefficients (the Estimate column), and mdl2 has six coefficients. The adjusted R-squared of mdl1 is 0.746, which is slightly less (worse) than that of mdl2, 0.758.Create a mileage model stepwise with a full quadratic model as the upper bound, starting from the fullquadratic model:mdl3 = LinearModel.stepwise(ds,'quadratic',... 'ResponseVar','MPG','Upper','quadratic');Compare the three model complexities by examining their formulas.mdl1.Formulaans =MPG ~ 1 + Horsepower*Weightmdl2.Formulaans =

The models have similar residuals. It is not clear which fits the data better. Interestingly, the morecomplex models have larger maximum deviations of the residuals:Rrange1 = [min(mdl1.Residuals.Raw),max(mdl1.Residuals.Raw)]; Rrange2 =[min(mdl2.Residuals.Raw),max(mdl2.Residuals.Raw)]; Rrange3 =[min(mdl3.Residuals.Raw),max(mdl3.Residuals.Raw)]; Rranges = [range1;range2;range3]

Rranges =-10.7725 14.7314-11.4407 16.7562-12.2723 16.7927

Robust Regression Reduce Outlier Effects

What Is Robust Regression?

The models described in What Are Linear Regression Models? on page 9-7 are based on certainassumptions, such as a normal distribution of errors in the observed responses. If the distribution oferrors is asymmetric or prone to outliers, model assumptions are invalidated, and parameter estimates,confidence intervals, and other computed statistics become unreliable. Use LinearModel.fit with theRobustOpts name-value pair to create a model that is not much affected by outliers. The robust fittingmethod is less sensitive than ordinary least squares to large changes in small parts of the data.Robust regression works by assigning a weight to each data point. Weighting is done automatically anditeratively using a process called iteratively reweighted least squares. In the first iteration, each point isassigned equal weight and model coefficients are estimated using ordinary least squares. At subsequentiterations, weights are recomputed so that points farther from model predictions in the previous iterationare given lower weight. Model coefficients are then recomputed using weighted least squares. Theprocess continues until the values of the coefficient estimates converge within a specified tolerance.

Robust Regression versus Standard Least-Squares Fit

This example shows how to use robust regression. It compares the results of a robust fit to a standardleast-squares fit.Step 1. Prepare data.

Load the moore data. The data is in the first five columns, and the response in the sixth.load moore X = [moore(:,1:5)]; y = moore(:,6);Step 2. Fit robust and nonrobust models.

Examine the residuals of the two models.

The residuals from the

robust fit (right half of the plot) are nearly all closer to the straight line, except for the one obviousoutlier.4. Remove the outlier from the standard model

Find the index of the outlier. Examine the weight of the outlier in the robust fit.[~,outlier] = max(mdlr.Residuals.Raw); mdlr.Robust.Weights(outlier)ans =0.0246This weight is much less than a typical weight of an observation:median(mdlr.Robust.Weights)ans = 0.9718

Introduction to Ridge Regression

Coefficient estimates for the models described in Linear Regression on page 9-11 rely on theindependence of the model terms. When terms are correlated and the columns of the design matrix Xhave an approximate linear dependence, the matrix (XTX)1 becomes close to singular. As a result, theleast-squares estimateE=()

TT

XX X ybecomes highly sensitive to random errors in the observed response y, producing a large variance. Thissituation of multicollinearity can arise, for example, when data are collected without an experimentaldesign.Ridge regression addresses the problem by estimating regression coefficients usingE()

TT

XX kI X y=+where k is the ridge parameter and I is the identity matrix. Small positive values of k improve theconditioning of the problem and reduce the variance of the estimates. While biased, the reduced varianceof ridge estimates often result in a smaller mean square error when compared to least-squares estimates.The Statistics Toolbox function ridge carries out ridge regression.

The estimates stabilize to the right of the plot. Note that the coefficient of the x2x3 interaction termchanges sign at a value of the ridge parameter 5104.

Lasso and Elastic Net

In this section...What Are Lasso and Elastic Net? on page 9-123 Lasso Regularization on page 9-123Lasso and Elastic Net with Cross Validation on page 9-126 Wide Data via Lasso and ParallelComputing on page 9-129 Lasso and Elastic Net Details on page 9-134 References on page 9-136

What Are Lasso and Elastic Net?

Lasso is a regularization technique. Use lasso to: Reduce the number of predictors in a regression model. Identify important predictors. Select among redundant predictors. Produce shrinkage estimates with potentially lower predictive errors than ordinary least squares.

Elastic net is a related technique. Use elastic net when you have several highly correlated variables.lasso provides elastic net regularization when you set the Alpha name-value pair to a number strictlybetween 0 and 1.See Lasso and Elastic Net Details on page 9-134.For lasso regularization of regression ensembles, see regularize.

The plot shows the nonzero coefficients in the regression for various values of the Lambdaregularization parameter. Larger values of Lambda appear on the left side of the graph, meaning moreregularization, resulting in fewer nonzero regression coefficients.

The dashed vertical lines represent the Lambda value with minimal mean squared error (on the right),and the Lambda value with minimal mean squared error plus one standard deviation. This latter value isa recommended setting for Lambda. These lines appear only when you perform cross validation. Crossvalidate by setting the 'CV' name-value pair. This example uses 10-fold cross validation.The upper part of the plot shows the degrees of freedom (df), meaning the number of nonzerocoefficients in the regression, as a function of Lambda. On the left, the large value of Lambda causes allbut one coefficient to be 0. On the right all five coefficients are nonzero, though the plot shows only twoclearly. The other three coefficients are so small that you cannot visually distinguish them from 0.For small values of Lambda (toward the right in the plot), the coefficient values are close to the leastsquares estimate. See step 5 on page 9-126.Find the Lambda value of the minimal cross-validated mean squared error plus one standard deviation.Examine the MSE and coefficients of the fit at that Lambda:4

Lasso and Elastic Net with Cross Validation

Consider predicting the mileage ( MPG) of a car based on its weight, displacement, horsepower, andacceleration. The carbig data contains these measurements. The data seem likely to be correlated,making elastic net an attractive choice.

pnames = {'Acceleration','Displacement',... 'Horsepower','Weight'};

When you activate the data cursor

and click the plot, you see the name of the predictor, the coefficient, the value of Lambda, and the indexof that point, meaning the column in b associated with that fit.Here, the elastic net and lasso results are not very similar. Also, the elastic net plot reflects a notablequalitative property of the elastic net technique. The elastic net retains three nonzero coefficients asLambda increases (toward the left of the plot), and these three coefficients reach 0 at about the sameLambda value. In contrast, the lasso plot shows two of the three coefficients becoming 0 at the samevalue of Lambda, while another coefficient remains nonzero for higher values of Lambda.This behavior exemplifies a general pattern. In general, elastic net tends to retain or drop groups ofhighly correlated predictors as Lambda increases. In contrast, lasso tends to drop smaller groups, or evenindividual predictors.

Wide Data via Lasso and Parallel Computing

Lassoandelasticnetareespeciallywellsuitedto wide data, meaning data with more predictors thanobservations. Obviously, there are redundant predictors in this type of data. Use lasso along with crossvalidation to identify important predictors.Cross validation can be slow. If you have a Parallel Computing Toolbox license, speed the computationusing parallel computing.1 Load the spectra data:load spectra DescriptionDescription === Spectral and octane data of gasoline == NIR spectra and octane numbers of 60 gasoline samplesNIR: NIR spectra, measured in 2 nm intervals from 900 nm to 1700 nm octane: octane numbers spectra: a dataset array containing variables forNIR and octane

Compute the default lasso fit:

[b fitinfo] = lasso(NIR,octane);3 Plot the number of predictors in the fitted lasso regularization as a function of Lambda, using alogarithmic x-axis:lassoPlot(b,fitinfo,'PlotType','Lambda','XScale','log');4 It is difficult to tell which value of Lambda is appropriate. To determine a good value, try fitting withcross validation:2

tic[b fitinfo] = lasso(NIR,octane,'CV',10); % A time-consuming operationtocElapsed time is 226.876926 seconds.5 Plot the result:lassoPlot(b,fitinfo,'PlotType','Lambda','XScale','log');You can see the suggested value of Lambda is over 1e-2, and the Lambda with minimal MSE is under1e-2. These values are in the fitinfo structure:fitinfo.LambdaMinMSE ans =0.0057fitinfo.Lambda1SE ans =0.01906 Examine the quality of the fit for the suggested value of Lambda:lambdaindex = fitinfo.Index1SE; fitinfo.MSE(lambdaindex)ans =0.0532fitinfo.DF(lambdaindex)ans = 11The fit uses just 11 of the 401 predictors, and achieves a cross-validated MSE of 0.0532.7 Examine the plot of cross-validated MSE:lassoPlot(b,fitinfo,'PlotType','CV');% Use a log scale for MSE to see small MSE values better set(gca,'YScale','log');

As Lambda increases (toward the left), MSE increases rapidly. The coefficients are reduced too muchand they do not adequately fit the responses.As Lambda decreases, the models are larger (have more nonzero coefficients). The increasing MSEsuggests that the models are overfitted.The default set of Lambda values does not include values small enough to include all predictors. In thiscase, there does not appear to be a reason to look at smaller values. However, if you want smaller valuesthan the default, use the LambdaRatio parameter, or supply a sequence of Lambda values using theLambda parameter. For details, see the lasso reference page.To compute the cross-validated lasso estimate faster, use parallel computing (available with a ParallelComputing Toolbox license):matlabpool openStarting matlabpool using the 'local' configuration ... connected to 4 labs.8

opts = statset('UseParallel',true);tic;[b fitinfo] = lasso(NIR,octane,'CV',10,'Options',opts); tocElapsed time is 107.539719 seconds.Computing in parallel is more than twice as fast on this problem using a quad-core processor.

Lasso and Elastic Net Details

Overview of Lasso and Elastic NetLasso is a regularization technique for performing linear regression. Lasso includes a penalty term that

constrains the size of the estimated coefficients. Therefore, it resembles ridge regression. Lasso is ashrinkage estimator:it generates coefficient estimates that are biased to be small. Nevertheless, a lassoestimator can have smaller mean squared error than an ordinary least-squares estimator when you applyit to new data.Unlike ridge regression, as the penalty term increases, lasso sets more coefficients to zero. This meansthat the lasso estimator is a smaller model, with fewer predictors. As such, lasso is an alternative tostepwise regression and other model selection and dimensionality reduction techniques.Elastic net is a related technique. Elastic net is a hybrid of ridge regression and lasso regularization. Likelasso, elastic net can generate reduced models by generating zero-valued coefficients. Empirical studieshave suggested that the elastic net technique can outperform lasso on data with highly correlatedpredictors.Definition of LassoThe lasso technique solves this regularization problem. For a given value of , a nonnegative parameter,lasso solves the problem

min

1N 2p ,

,EE

EEOE2Nyxj

i 11

where N is the number of observations.yi is the response at observation i.xi is data, a vector of p values at observation i. is a positive regularization parameter corresponding to one value of Lambda. The parameters0 and are scalar and p-vector respectively.As increases, the number of nonzero components of decreases.The lasso problem involves the L1 norm of , as contrasted with the elastic net algorithm.Definition of Elastic NetThe elastic net technique solves this regularization problem. For an strictly between 0 and 1, and anonnegative , elastic net solves the problem

min

1N 2 ,

,EE2

Nyx P

i 1 wherePD

() p () 2DEDE .2 2 1 jj j 1 2

Elastic net is the same as lasso when =1. As shrinks toward 0, elastic net approaches ridge regression.For other values of , the penalty term P ( ) interpolates between the L1 norm of and the squared L2 normof .

Introduction to Partial Least Squares

Partial least-squares (PLS) regression is a technique used with data that contain correlated predictorvariables. This technique constructs new predictor variables, known as components, as linearcombinations of the original predictor variables. PLS constructs these components while considering theobserved response values, leading to a parsimonious model with reliable predictive power.The technique is something of a cross between multiple linear regression and principal componentanalysis: Multiple linear regression finds a combination of the predictors that best fit aresponse. Principal component analysis finds combinations of the predictors with large variance, reducingcorrelations. The technique makes no use of response values.

PLS finds combinations of the predictors that have a large covariance with theresponsevalues.PLS therefore combines information about the variances of both the predictors and the responses, whilealso considering the correlations among them.PLS shares characteristics with other regression and feature transformation techniques. It is similar toridge regression in that it is used in situations with correlated predictors. It is similar to stepwiseregression (or more general feature selection techniques) in that it can be used to select a smaller set ofmodel terms. PLS differs from these methods, however, by transforming the original predictor space intothe new component space.The Statistics Toolbox function plsregress carries out PLS regression. For example, consider the data onbiochemical oxygen demand in moore.mat, padded with noisy versions of the predictors to introducecorrelations:load moorey = moore(:,6); % ResponseX0 = moore(:,1:5); % Original predictors X1 = X0+10*randn(size(X0)); % Correlated predictors X =[X0,X1];Use plsregress to perform PLS regression with the same number of components as predictors, then plotthe percentage variance explained in the response as a function of the number of components:[XL,yl,XS,YS,beta,PCTVAR] = plsregress(X,y,10);plot(1:10,cumsum(100*PCTVAR(2,:)),'-bo'); xlabel('Number of PLS components'); ylabel('PercentVariance Explained in y');Choosing the number of components in a PLS model is a critical step. The plot gives a rough indication,showing nearly 80% of the variance in y explained by the first component, with as many as fiveadditional components making significant contributions.The following computes the six-component model:[XL,yl,XS,YS,beta,PCTVAR,MSE,stats] = plsregress(X,y,6); yfit = [ones(size(X,1),1) X]*beta;plot(y,yfit,'o')The scatter shows a reasonable correlation between fitted and observed responses, and this is confirmedby the R2 statistic:TSS = sum((y-mean(y)).^2); RSS = sum((y-yfit).^2); Rsquared = 1 - RSS/TSS Rsquared =0.8421A plot of the weights of the ten predictors in each of the six components shows that two of thecomponents (the last two computed) explain the majority of the variance in X:plot(1:10,stats.W,'o-');legend({'c1','c2','c3','c4','c5','c6'},'Location','NW') xlabel('Predictor');ylabel('Weight');

A plot of the mean-squared errors suggests that as few as two components may provide an adequatemodel:[axes,h1,h2] = plotyy(0:6,MSE(1,:),0:6,MSE(2,:)); set(h1,'Marker','o')set(h2,'Marker','o')legend('MSE Predictors','MSE Response')xlabel('Number of Components')The calculation of mean-squared errors by plsregress is controlled by optional parameter name/valuepairs specifying cross-validation type and the number of Monte Carlo repetitions.

What Are Generalized Linear Models?

Linear regression models describe a linear relationship between a response and one or more predictiveterms. Many times, however, a nonlinear relationship exists. Nonlinear Regression on page 9-198describes general nonlinear models. A special class of nonlinear models, called generalized linearmodels, uses linear methods.Recall that linear models have these characteristics: At each set of values for the predictors, the response has a normal distribution with mean . A coefficient vector b defines a linear combination Xb of the predictors X. The model is = Xb.In generalized linear models, these characteristics are generalized as follows: At each set of values for the predictors, the response has a distribution that can be normal, binomial,Poisson, gamma, or inverse Gaussian, with parameters including a mean . A coefficient vector b defines a linear combination Xb of the predictors X. A link function f defines the model as f( )= Xb.

Prepare DataTo begin fitting a regression, put your data into a form that fitting functions expect. All regressiontechniques begin with input data in an array X and response data in a separate vector y, or input data in adataset array ds and response data as a column in ds. Each row of the input data represents oneobservation. Each column represents one predictor (variable).

For a dataset array ds, indicate the response variable with the 'ResponseVar' name-value pair:mdl = LinearModel.fit(ds,'ResponseVar','BloodPressure'); %ormdl = GeneralizedLinearModel.fit(ds,'ResponseVar','BloodPressure');The response variable is the last column by default.You can use numeric categorical predictors. A categorical predictor is one that takes values from a fixedset of possibilities. For a numeric array X, indicate the categorical predictors using the 'Categorical' name-value pair. Forexample, to indicate that predictors 2 and 3 out of six are categorical:mdl = LinearModel.fit(X,y,'Categorical',[2,3]);%ormdl = GeneralizedLinearModel.fit(X,y,'Categorical',[2,3]); % or equivalentlymdl = LinearModel.fit(X,y,'Categorical',logical([0 1 1 0 0 0])); For a dataset array ds, fitting functions assume that these data types are categorical:- Logical- Categorical (nominal or ordinal)- String or character arrayIf you want to indicate that a numeric predictor is categorical, use the 'Categorical' name-value pair.Represent missing numeric data as NaN. To represent missing data for other data types, see MissingGroup Values on page 2-53. For a 'binomial' model with data matrix X, the response y can be:- Binary column vector Each entry represents success (1)orfailure(0).- Two-column matrix of integers The first column is the number of successes in each observation, thesecond column is the number of trials in that observation. For a 'binomial' model with dataset ds:- Use the ResponseVar name-value pair to specify the column of ds that gives the number of successes ineach observation.- Use the BinomialSize name-value pair to specify the column of ds that gives the number of trials ineach observation.Dataset Array for Input and Response Data For example, to create a dataset array from an Excelspreadsheet:ds = dataset('XLSFile','hospital.xls',... 'ReadObsNames',true);To create a dataset array from workspace variables:load carsmallds = dataset(MPG,Weight); ds.Year = ordinal(Model_Year);NumericMatrix for Input Data, Numeric Vector for Response For example, to create numeric arraysfrom workspace variables:

Choose Fitting Method and Model

Therearetwoways to create a fitted model. Use GeneralizedLinearModel.fit when you have a good idea of your generalized linear model, or whenyou want to adjust your model later to include or exclude certain terms. Use GeneralizedLinearModel.stepwise when you want to fit your model using stepwise regression.GeneralizedLinearModel.stepwise starts from one model, such as a constant, and adds or subtracts termsone at a time, choosing an optimal term each time in a greedy fashion, until it cannot improve further.Use stepwise fitting to find a good model, one that has only relevant terms.The result depends on the starting model. Usually, starting with a constant model leads to a small model.Starting with more terms can lead to a more complex model, but one that has lower mean squared error.In either case, provide a model to the fitting function (which is the starting model forGeneralizedLinearModel.stepwise).Specify a model using one of these methods. Brief String on page 9-150 Terms Matrix on page 9-151 Formula on page 9-154Brief StringString'constant' 'linear''interactions'Model TypeModel contains only a constant (intercept) term.Model contains an intercept and linear terms for each predictor.Model contains an intercept, linear terms, and all products of pairs of distinct predictors (no squaredterms).String'purequadratic''quadratic''polyijk'Model TypeModel contains an intercept, linear terms, and squared terms.Model contains an intercept, linear terms, interactions, and squared terms.

Model is a polynomial with all terms up to degree i in the first predictor, degree j in the second predictor,etc. Use numerals 0 through 9.For example, 'poly2111' has a constant plus all linear and product terms,and also contains terms with predictor 1 squared.Terms Matrix A terms matrix is a T-byP+1 matrix specifying terms in a model, where T is the numberof terms, P is the number of predictor variables, and plus one is for the response variable. The value ofT(i,j) is the exponent of variable j in term i. For example, if there are three predictor variables A, B,andC:[0 0 0 0] % constant term or intercept [0 1 0 0] % B; equivalently, A^0 * B^1 * C^0 [1 0 1 0] % A*C[2 0 0 0] % A^2[0 1 2 0] % B*(C^2)The 0 at the end of each term represents the response variable. In general, If you have the variables in a dataset array, then a 0 must represent the response variable depending onthe position of the response variable in the dataset array. For example:Load sample data and define the dataset array.load hospitalds = dataset(hospital.Sex,hospital.BloodPressure(:,1),hospital.Age,...hospital.Smoker,'VarNames',{'Sex','BloodPressure','Age','Smoker'});Represent the linear model 'BloodPressure ~ 1 + Sex + Age + Smoker' in a terms matrix. The responsevariable is in the second column of the data set array, so there must be a column of zeros for theresponse variable in the second column of the term matrix.T= [0 00 0;1 00 0;0 0 1 0;00 01]T=000 0100 0001 0000 1Redefine the dataset array.ds = dataset(hospital.BloodPressure(:,1),hospital.Sex,hospital.Age,...hospital.Smoker,'VarNames',{'BloodPressure','Sex','Age','Smoker'});Now, the response variable is the first term in the data set array. Specify the same linear model,'BloodPressure ~ 1 + Sex + Age + Smoker', using a term matrix.T= [0 00 0;0 10 0;0 0 1 0;00 01]T=000 0010 0

Estimated Coefficients:Estimate SE tStat pValue (Intercept) 49.238 1.6411 30.002 2.7015e-49 x2 -0.0086119 0.0005348-16.103 1.6434e-28Number of observations: 94, Error degrees of freedom: 92 Root Mean Squared Error: 4.13R-squared: 0.738, Adjusted R-Squared 0.735F-statistic vs. constant model: 259, p-value = 1.64e-28The results of the stepwise regression are consistent with the results of LinearModel.fit in the previousstep.FormulaA formula for a model specification is a string of the form'Y ~ terms', Y is the response name. terms contains- Variable names- + to include the next variable- to exclude the next variable- : to define an interaction, a product of terms- * to define an interaction and all lower-order terms- ^toraisethepredictortoapower,exactlyasin * repeated, so ^ includes lower order terms as well- () to group termsTip Formulas include a constant (intercept) term by default. To exclude a constant term from the model,include -1 in the formula.Examples:'Y ~A +B+ C' is a three-variable linear model with intercept. 'Y ~A +B+ C- 1' is a three-variable linearmodel without intercept. 'Y ~A +B+ C+ B^2' is a three-variable model with intercept and a B^2 term.'Y ~A +B^2 +C' is the same as the previous example, since B^2 includes a B term.'Y ~A +B+ C+ A:B' includes an A*B term.'Y ~A*B+C' is the same as the previous example, sinceA*B= A+ B + A:B.'Y ~ A*B*C - A:B:C' has all interactions among A, B,and C,exceptthe three-way interaction.'Y ~ A*(B + C + D)' has all linear terms, plus products of A with each of the other variables.

Fit Model to Data

Create a fitted model using GeneralizedLinearModel.fit or GeneralizedLinearModel.stepwise. Choosebetween them as in Choose Fitting Method and Model on page 9-150. For generalized linear modelsother than those with a normal distribution, give a Distribution name-value pair as in ChooseGeneralized Linear Model and Link Function on page 9-146. For example,mdl = GeneralizedLinearModel.fit(X,y,'linear','Distribution','poisson') %ormdl = GeneralizedLinearModel.fit(X,y,'quadratic',...'Distribution','binomial')

The pValue for (Intercept), x2 and x3 are larger than 0.01. These three predictors were not used tocreate the response data y.The pValue for x3 is just over .05, so might be regarded as possiblysignificant. The display contains the Chi-square statistic.Diagnostic PlotsDiagnostic plots help you identify outliers, and see other problems in your model or fit. To illustratethese plots, consider binomial regression with a logistic link function.The logistic model is useful for proportion data. It defines the relationship between the proportion p andthe weight w by:log[p/(1 p)] = b1 + b2wThis example fits a binomial model to data. The data are derived from carbig.mat, which containsmeasurements of large cars of various weights. Each weight in w has a corresponding number of cars intotal and a corresponding number of poor-mileage cars in poor.It is reasonable to assume that the values of poor follow binomial distributions, with the number of trialsgiven by total and the percentage of successes depending on w. This distribution can be accounted for inthe context of a logistic model by using a generalized linear model with link function log(/(1 )) =Xb. This link function is called 'logit'.w = [2100 2300 2500 2700 2900 3100 ...3300 3500 3700 3900 4100 4300]';total = [48 42 31 34 31 21 23 23 21 16 17 21]';poor = [1 2 0 3 8 8 14 17 19 15 17 21]';mdl = GeneralizedLinearModel.fit(w,[poor total],... 'linear','Distribution','binomial','link','logit')mdl =Generalized Linear regression model: logit(y) ~ 1 + x1Distribution = BinomialEstimated Coefficients:Estimate SE tStat pValue (Intercept) -13.38 1.394 -9.5986 8.1019e-22 x1 0.0041812 0.00044258 9.44743.4739e-2112 observations, 10 error degrees of freedomDispersion: 1Chi^2-statistic vs. constant model: 242, p-value = 1.3e-54See how well the model fits the data.plotSlice(mdl)

The fit looks reasonably

This is typical of a regression with points ordered by the predictor variable. The leverage of each pointon the fit is higher for points with relatively extreme predictor values (in either direction) and low forpoints with average predictor values. In examples with multiple predictors and with points not ordered

by predictor value, this plot can help you identify which observations have high leverage because theyare outliers as measured by their predictor values.Residuals Model Quality for Training DataThere are several residual plots to help you discover errors, outliers, or correlations in the model or data.The simplest residual plots are the default histogram plot, which shows the range of the residuals andtheir frequencies, and the probability plot, which shows how the distribution of the residuals comparesto a normal distribution with matched variance.This example shows residual plots for a fitted Poisson model. The data construction has two out of fivepredictors not affecting the response, and no intercept term:rng('default') % for reproducibility X = randn(100,5);mu = exp(X(:,[1 4 5])*[2;1;.5]); y = poissrnd(mu);mdl = GeneralizedLinearModel.fit(X,y,...'linear','Distribution','poisson');Examine the residuals:plotResiduals(mdl)

While most residuals

cluster near 0, there are several near 18. So examine a different residuals plot.plotResiduals(mdl,'fitted')

The large residuals dont

seem to have much to do with the sizes of the fitted values.Perhaps a probability plot is more informative.plotResiduals(mdl,'probability')

Now it is clear. The

residuals do not follow a normal distribution. Instead, they have fatter tails, much as an underlyingPoisson distribution.Plots to Understand Predictor Effects and How to Modify a ModelThis example shows how to understand the effect each predictor has on a regression model, and how tomodify the model to remove unnecessary terms.

Create a model from some predictors in artificial data. The data do not use the second and thirdcolumns in X. So you expect the model not to show much dependence on those predictors.1

The scale of the first predictor is overwhelming the plot. Disable it using the Predictors menu.

Now it is clear that predictors 2 and 3 have little to no effect.

You can drag the individual predictor values, which are represented by dashed blue vertical lines. Youcan also choose between simultaneous and non-simultaneous confidence bounds, which are representedby dashed red curves. Dragging the predictor lines confirms that predictors 2 and 3 have little to noeffect.Remove the unnecessary predictors using either removeTerms or step. Using step can be safer, in casethere is an unexpected importance to a term that becomes apparent after removing another term.However, sometimes removeTerms can be effective when step does not proceed. In this case, the twogive identical results.3

Predict or Simulate Responses to New Data

There are three ways to use a linear model to predict the response to new data: predict on page 9-168 feval on page 9-169 random on page 9-170predictThe predict method gives a prediction of the mean responses and, if requested, confidence bounds.This example shows how to predict and obtain confidence intervals on the predictions using the predictmethod.Create a model from some predictors in artificial data. The data do not use the second and thirdcolumns in X. So you expect the model not to show much dependence on these predictors. Construct themodel stepwise to include the relevant predictors automatically.1

0.11301.73753.7471ynewci =1.0e+04 *0.0821 0.15551.2167 2.48112.8419 4.9407fevalWhen you construct a model from a dataset array, feval is often more convenient for predicting meanresponses than predict. However, feval does not provide confidence bounds.This example shows how to predict mean responses using the feval method.Create a model from some predictors in artificial data. The data do not use the second and thirdcolumns in X. So you expect the model not to show much dependence on these predictors. Construct themodel stepwise to include the relevant predictors automatically.1

0.11301.73753.7471randomThe random method generates new random response values for specified predictor values. Thedistribution of the response values is the distribution used in the model. random calculates the mean ofthe distribution from the predictors, estimated coefficients, and link function. For distributions such asnormal, the model also provides an estimate of the variance of the response. For the binomial andPoisson distributions, the variance of the response is determined by the mean; random does not use aseparate dispersion estimate.This example shows how to simulate responses using the random method.Create a model from some predictors in artificial data. The data do not use the second and thirdcolumns in X. So you expect the model not to show much dependence on these predictors. Construct themodel stepwise to include the relevant predictors automatically.1

Generalized Linear Model Workflow

This example shows a typical workflow: import data, fit a generalized linear model, test its quality,modify it to improve the quality, and make predictions based on the model. It computes the probabilitythat a flower is in one of two classes, based on the Fisher iris data.Step 1. Load the data.

Load the Fisher iris data. Extract the rows that have classification versicolor or virginica. These are rows51 to 150. Create logical response variables that are trueforversicolorflowers.load fisheririsX = meas(51:end,:); % versicolor and virginica y = strcmp('versicolor',species(51:end));

disp([oldCoeffs newCoeffs])50.5268 50.52688.3761 8.3761-7.8745 -7.8745-21.4296 -21.4296The model coefficients do not change, suggesting that the response at the high-leverage point isconsistent with the predicted value from the reduced model.Step 5. Predict the probability that a new flower is versicolor.

Use mdl2 to predict the probability that a flower with average measurements is versicolor. Generateconfidence intervals for your prediction.[newf newc] = predict(mdl2,mean(X))newf =0.5086newc =0.1863 0.8239The model gives almost a 50% probability that the average flower is versicolor, with a wide confidenceinterval about this estimate.

What is Generalized Linear Model Lasso Regularization?

Lasso is a regularization technique. Use lassoglm to: Reduce the number of predictors in a generalized linear model. Identify important predictors. Select among redundant predictors. Produce shrinkage estimates with potentially lower predictive errors than ordinary least squares.Elastic net is a related technique. Use it when you have several highly correlated variables. lassoglmprovides elastic net regularization when you set the Alpha name-value pair to a number strictly between0 and 1.

For details about lasso and elastic net computations and algorithms, see Generalized Linear ModelLasso and Elastic Net on page 9-195. For a discussion of generalized linear models, see What AreGeneralized Linear Models? on page 9-143.

Regularize Poisson Regression

This example shows how to identify and remove redundant predictors from a generalized linear model.Create data with 20 predictors, and Poisson responses using just three of the predictors, plus a constant.rng('default') % for reproducibility X = randn(100,20);mu = exp(X(:,[5 10 15])*[.4;.2;.3] + 1); y = poissrnd(mu);Construct a cross-validated lasso regularization of a Poisson regression model of the data.[B FitInfo] = lassoglm(X,y,'poisson','CV',10);Examine the cross-validation plot to see the effect of the Lambda regularization parameter.lassoPlot(B,FitInfo,'plottype','CV');

The green circle and dashed line locate the Lambda with minimalcross-validation error. The blue circle and dashed line locate the point with minimal cross-validationerror plus one standard deviation.Find the nonzero model coefficients corresponding to the two identified points.minpts = find(B(:,FitInfo.IndexMinDeviance))minpts =

35610111516min1pts = find(B(:,FitInfo.Index1SE))min1pts =51015The coefficients from the minimal plus one standard error point are exactly those coefficients used tocreate the data.Find the values of the model coefficients at the minimal plus one standard error point.B(min1pts,FitInfo.Index1SE)ans =0.29030.07890.2081The values of the coefficients are, as expected, smaller than the original [0.4,0.2,0.3]. Lasso works byshrinkage, which biases predictor coefficients toward zero. See Lasso and Elastic Net Details onpage 9-134.The constant term is in the FitInfo.Intercept vector.FitInfo.Intercept(FitInfo.Index1SE)ans =1.0879 The constant term is near 1, which is the value used to generate the data.

Regularize Logistic Regression

This example shows a workflow for regularizing binomial regression. The default (canonical) linkfunction for binomial regression is the logistic function.Step 1. Prepare the data.

Load the ionosphere data. The response Y is a cell array of 'g' or 'b' strings. Convert the cells to logicalvalues, with true representing 'g'.Removethe first two columns of X because they have some awkwardstatistical properties, which are beyond the scope of this discussion.load ionosphereYbool = strcmp(Y,'g'); X = X(:,3:end);

lassoPlot can give both a standard trace plot and a cross-validated deviance plot. Examine both plots.lassoPlot(B,FitInfo,'PlotType','CV');

The plot identifies the minimum-deviance point with a green circle and dashed line as a function of theregularization parameter Lambda. The blue circled point has minimum deviance plus no more than onestandard deviation.lassoPlot(B,FitInfo,'PlotType','Lambda','XScale','log');The trace plot shows nonzero model coefficients as a function of the regularization parameter Lambda.Because there are 32 predictors and a linear model, there are 32 curves. As Lambda increases to the left,lassoglm sets various coefficients to zero, removing them from the model.The trace plot is somewhat compressed. Zoom in to see more detail.xlim([.01 .1]) ylim([-3 3])As Lambda increases toward the left side of the plot, fewer nonzero coefficients remain.Find the number of nonzero model coefficients at the Lambda value with minimum deviance plus onestandard deviation point. The regularized model coefficients are in column FitInfo.Index1SE of the Bmatrix.

indx = FitInfo.Index1SE; B0 = B(:,indx);

nonzeros = sum(B0 ~= 0)nonzeros =14When you set Lambda to FitInfo.Index1SE, lassoglm removes over half of the 32 original predictors.Step 4. Create a regularized model.

351 observations, 336 error degrees of freedom Dispersion: 1

Chi^2-statistic vs. constant model: 262, p-value = 1e-47Plot the residuals of the model.plotResiduals(mdl)As expected, residuals from the least-squares model are slightly smaller than those of the regularizedmodel. However, this does not mean that mdl is a better predictor for new data.

Regularize Wide Data in Parallel

This example shows how to regularize a model with many more predictors than observations. Wide datais data with more predictors than observations. Typically, with wide data you want to identify importantpredictors. Use lassoglm as an exploratory or screening tool to select a smaller set of variables toprioritize your modeling and research. Use parallel computing to speed up cross validation.Load the ovariancancer data. This data has 216 observations and 4000 predictors in the obs workspacevariable. The responses are binary, either 'Cancer' or 'Normal',inthe grp workspace variable. Convert theresponses to binary for use in lassoglm.load ovariancancery = strcmp(grp,'Cancer');Set options to use parallel computing. Prepare to compute in parallel using matlabpool.opt = statset('UseParallel',true); matlabpool open 4Starting matlabpool using the 'local' profile ... connected to 4 labs.Fit a cross-validated set of regularized models. Use the Alpha parameter to favor retaining groups ofhighly correlated predictors, as opposed to eliminating all but one member of the group. Commonly, youuse a relatively large value of Alpha.rng('default') % for reproducibilitytic[B,S] = lassoglm(obs,y,'binomial','NumLambda',100, ...'Alpha',0.9,'LambdaRatio',1e-4,'CV',10,'Options',opt); tocElapsed time is 480.451777 seconds.Examine trace and cross-validation plots.lassoPlot(B,S,'PlotType','CV');

lassoPlot(B,S,'PlotType','Lambda','XScale','log')

The right (green) vertical dashed line represents the Lambda providing the smallest cross-validateddeviance. The left (blue) dashed line has the minimal deviance plus no more than one standarddeviation. This blue line has many fewer predictors:[S.DF(S.Index1SE) S.DF(S.IndexMinDeviance)]ans =

50 86You asked lassoglm to fit using 100 different Lambda values. How many did it use?size(B)ans =4000 84lassoglm stopped after 84 values because the deviance was too small for small Lambda values. To avoidoverfitting, lassoglm halts when the deviance of the fitted model is too small compared to the deviancein the binary responses, ignoring the predictor variables.You can force lassoglm to include more terms by explicitly providing a set of Lambda values.minLambda = min(S.Lambda);explicitLambda = [minLambda*[.1 .01 .001] S.Lambda]; [B2,S2] =lassoglm(obs,y,'binomial','Lambda',explicitLambda,...'LambdaRatio',1e-4, 'CV',10,'Options',opt);length(S2.Lambda)ans =87lassoglm used the three smaller values in fitting.To save time, you can use: Fewer Lambda, meaning fewer fits Fewer cross-validation folds A larger value for LambdaRatioUse serial computation and all three of these time-saving methods:tic[Bquick,Squick] = lassoglm(obs,y,'binomial','NumLambda',25,... 'LambdaRatio',1e-2,'CV',5);tocElapsed time is 51.708074 seconds.Graphically compare the new results to the first results.lassoPlot(Bquick,Squick,'PlotType','CV');

lassoPlot(Bquick,Squick,'PlotType','Lambda','XScale','log')

The number of nonzero

coefficients in the lowest plus one standard deviation model is around 50, similar to the firstcomputation.

Generalized Linear Model Lasso and Elastic Net

Overview of Lasso and Elastic NetLasso is a regularization technique for estimating generalized linear models. Lasso includes a penaltyterm that constrains the size of the estimated coefficients. Therefore, it resembles ridge regression. Lasso

is a shrinkage estimator: it generates coefficient estimates that are biased to be small. Nevertheless, alasso estimator can have smaller error than an ordinary maximum likelihood estimator when you apply itto new data.Unlike ridge regression, as the penalty term increases, the lasso technique sets more coefficients to zero.This means that the lasso estimator is a smaller model, with fewer predictors. As such, lasso is analternative to stepwise regression and other model selection and dimensionality reduction techniques.Elastic net is a related technique. Elastic net is akin to a hybrid of ridge regression and lassoregularization. Like lasso, elastic net can generate reduced models by generating zero-valuedcoefficients. Empirical studies suggest that the elastic net technique can outperform lasso on data withhighly correlated predictors.Definition of Lasso for Generalized Linear Models For a nonnegative value of , lasso solves theproblem

min 1 DevianceEE O Epj ,

0,EEj1

where Deviance is the deviance of the model fit to the responses using intercept 0 and predictor coefficients .The formula for Deviance depends on the distr parameter you supply to lassoglm. Minimizing the penalized deviance is equivalent to maximizing the -penalized log likelihood.N is the number of observations. is a nonnegative regularization parameter corresponding to one value of Lambda. Parameters0 and are scalar and p-vector respectively.As increases, the number of nonzero components of decreases.The lasso problem involves the L1 norm of , as contrasted with the elastic net algorithm.Definition of Elastic Net for Generalized Linear Models For an strictly between 0 and 1, and anonnegative , elastic net solves the problem

min 1 DevianceEE O E ,,0 EE

N where

PD

() p () 2DEDE .2 2 1 jj j 1 2

Elastic net is the same as lasso when = 1. For other values of , the penalty term P ( ) interpolatesbetween the L1 norm of and the squared L2 norm of .As shrinks toward 0, elastic net approaches ridgeregression.

What Are Parametric Nonlinear Regression Models?

Parametric nonlinear models represent the relationship between a continuous response variable and oneor more continuous predictor variables in the formy = f(X, )+ ,wherey is an n-by-1 vector of observations of the response variable.f is any function of X and that evaluates each row of X along with the vector to compute the predictionfor the corresponding row of y.X is an n-byp matrix of predictors, with one row for each observation, and one column for eachpredictor. is a p-by-1 vector of unknown parameters to be estimated. is an n-by-1 vector of independent, identically distributed random disturbances.

In contrast, nonparametric models do not attempt to characterize the relationship between predictors andresponse with model parameters. Descriptions are often graphical, as in the case of Classification Treesand Regression Trees on page 15-30.NonLinearModel.fit attempts to find values of the parameters that minimize the mean squareddifferences between the observed responses y and the predictions of the model f(X, ). To do so, it needs astarting value beta0 before iteratively modifying the vector to a vector with minimal mean squared error.

Prepare DataTo begin fitting a regression, put your data into a form that fitting functions expect. All regressiontechniques begin with input data in an array X and response data in a separate vector y, or input data in adataset array ds and response data as a column in ds. Each row of the input data represents oneobservation. Each column represents one predictor (variable).For a dataset array ds, indicate the response variable with the 'ResponseVar' name-value pair:mdl = LinearModel.fit(ds,'ResponseVar','BloodPressure');The response variable is the last column by default.You cannot use numeric categorical predictors for nonlinear regression. A categorical predictor is onethat takes values from a fixed set of possibilities.Represent missing data as NaN for both input data and response data.Dataset Array for Input and Response Data For example, to create a dataset array from an Excelspreadsheet:ds = dataset('XLSFile','hospital.xls',... 'ReadObsNames',true);To create a dataset array from workspace variables:load carsmallds = dataset(Weight,Model_Year,MPG n);Numeric Matrix for Input Data and Numeric Vector for ResponseFor example, to create numeric arrays from workspace variables:load carsmallX = [Weight Horsepower Cylinders Model_Year]; y = MPG;To create numeric arrays from an Excel spreadsheet:[X Xnames] = xlsread('hospital.xls'); y = X(:,4); % response y is systolic pressure X(:,4) = []; % removey from the X matrixNotice that the nonnumeric entries, such as sex, do not appear in X.

Represent the Nonlinear Model

There are several ways to represent a nonlinear model. Use whichever is most convenient.The nonlinear model is a required input to NonLinearModel.fit,inthe modelfun input.NonLinearModel.fit assumes that the response function f(X, )issmoothin the parameters . If yourfunction is not smooth, NonLinearModel.fit can fail to provide optimal parameter estimates.

Choose Initial Vector beta0

The initial vector for the fitting iterations, beta0, can greatly influence the quality of the resulting fittedmodel. beta0 gives the dimensionality of the problem, meaning it needs the correct length. A goodchoice of beta0 leads to a quick, reliable model, while a poor choice can lead to a long computation, orto an inadequate model.Itisdifficulttogiveadviceonchoosingagood beta0. If you believe certain components of the vector shouldbe positive or negative, set your beta0 to have those characteristics. If you know the approximate valueof other components, include them in beta0. However, if you dont know good values, try a randomvector, such asbeta0 = randn(nVars,1); %orbeta0 = 10*rand(nVars,1);

Fit Nonlinear Model to Data

The syntax for fitting a nonlinear regression model using a dataset array ds ismdl = NonLinearModel.fit(ds,modelfun,beta0)The syntax for fitting a nonlinear regression model using a numeric array X and numeric response vectory ismdl = NonLinearModel.fit(X,y,modelfun,beta0)For information on representing the input parameters, see Prepare Data on page 9-199, Represent theNonlinear Model on page 9-200, and Choose Initial Vector beta0 on page 9-203.NonLinearModel.fit assumes that the response variable in a dataset array ds is the last column. Tochange this, use the ResponseVar name-value pair to name the response column.

Examine Quality and Adjust the Fitted Model

There are diagnostic plots to help you examine the quality of a model. plotDiagnostics(mdl) gives a variety of plots, including leverage and Cooks distance plots.

plotResiduals(mdl) gives the difference between the fitted model and the data.There are also properties of mdl that relate to the model quality. mdl.RMSE gives the root mean square error between the data and the fitted model. mdl.Residuals.Raw gives the raw residuals. mdl.Diagnostics contains several fields, such as Leverage and CooksDistance, that can help youidentify particularly interesting observations.This example shows how to examine a fitted nonlinear model using diagnostic, residual, and slice plots.1 Load the reaction data.load reaction2 Create a nonlinear model of rate as a function of reactants using the hougen.m function, starting frombeta0 = ones(5,1);.beta0 = ones(5,1);mdl = NonLinearModel.fit(reactants,... rate,@hougen,beta0);3 Make a leverage plot of the data and model.plotDiagnostics(mdl)

has high leverage. Locate the point:

There is one point that

Nothing stands out as an

outlier.6 Use a slice plot to show the effect of each predictor on the model.plotSlice(mdl)You can drag the vertical dashed blue lines to see the effect of a change in one predictor on the response.For example, drag the X2 line to the right, and notice that the slope of the X3 line changes.

Predict orSimulate Responses to New Data

There are three ways to use a linear model to predict or simulate the response to new data: predict on page 9-210 feval on page 9-211 random on page 9-211The prediction method examples use the following synthetic data and model. The problem is to fit anonlinear curve of known form, with unknown center and scale parameters.1

Nonlinear Regression Workflow

This example shows a typical workflow: import data, fit a nonlinear regression, test its quality, modify itto improve the quality, and make predictions based on the model.Step 1. Prepare the data.

Load the reaction data

load reactionExamine the data in the workspace. reactants is a matrix with 13 rows and 3 columns. Each rowcorresponds to one observation, and each column corresponds to one variable. The variable names are inxn:xnxn =Hydrogenn-Pentane Isopentane

Similarly, rate is a vector of 13 responses, with the variable name in yn:

ynyn =Reaction RateThe hougen.m file contains a nonlinear model of reaction rate as a function of the three predictorvariables. For a 5-D vector b and 3-D vector x,hougen( , )bx x b( ) () ()/ ( ) .121 32 43() ( ) () () ( ) ()As a start point for the solution, take b as a vector of ones.beta0 = ones(5,1);Step 2. Fit a nonlinear model to the data.

To see the effect of each predictor on the response, make a slice plot using plotSlice(mdl).plotSlice(mdl)

plotSlice(mdl1)

The plots look very similar, with slightly wider confidence bounds for mdl1. This difference isunderstandable, since there is one less data point in the fit, representing over 7% fewer observations.Step 6. Predict for new data.

Create some new data and predict the response from both models.Xnew = [200,200,200;100,200,100;500,50,5];[ypred yci] = predict(mdl,Xnew)ypred =

Introduction to Mixed-Effects Models

In statistics, an effect is anything that influences the value of a response variable at a particular setting ofthe predictor variables. Effects are translated into model parameters. In linear models, effects becomecoefficients, representing the proportional contributions of model terms. In nonlinear models, effectsoften have specific physical interpretations, and appear in more general nonlinear combinations.Fixed effects represent population parameters, assumed to be the same each time data is collected.Estimating fixed effects is the traditional domain of regression modeling. Random effects, bycomparison, are sample-dependent random variables. In modeling, random effects act like additionalerror terms, and their distributions and covariances must be specified.

For example, consider a model of the elimination of a drug from the bloodstream. The model uses time tas a predictor and the concentration of the drug C as the response. The nonlinear model term C0ertcombines parameters C0 and r, representing, respectively, an initial concentration and an eliminationrate. If data is collected across multiple individuals, it is reasonable to assume that the elimination rate isa random variable ri depending on individual i, varying around a population mean r . The term C0ertbecomesCe+ [( )]rrrt()ii +bt,

00Ce

where = r is a fixed effect and bi =irr is a random effect.

Random effects are useful when data falls into natural groups. In the drug elimination model, the groupsare simply the individuals under study. More sophisticated models might group data by an individualsage, weight, diet, etc. Although the groups are not the focus of the study, adding random effects to amodel extends the reliability of inferences beyond the specific sample of individuals.Mixed-effects models account for both fixed and random effects. As with all regression models, theirpurpose is to describe a response variable as a function of the predictor variables. Mixed-effects models,however, recognize correlations within sample subgroups. In this way, they provide a compromisebetween ignoring data groups entirely and fitting each group with a separate model.

Mixed-Effects Model Hierarchy

Suppose data for a nonlinear regression model falls into one of m distinct groups i = 1, ..., m. To accountfor the groups in a model, write response j in group i as:ij

yf x(, )=+ij MHijyij is the response, xij is a vector of predictors, is a vector of model parameters, and ij is the measurementor process error. The index j ranges from 1 to ni,where ni is the number of observations in group i. Thefunction f specifies the form of the model. Often, xij is simply an observation time tij.Theerrorsareusuallyassumedtobeindependentandidentically,normally distributed, with constant variance.Estimates of the parameters in describe the population, assuming those estimates are the same for allgroups. If, however, the estimates vary by group, the model becomesij

yf x

(, )i

=+MHij

ij

In a mixed-effects model, i may be a combination of a fixed and a random effect:

ME=+iibThe random effects bi are usually described as multivariate normally distributed, with mean zero andcovariance . Estimating the fixed effects and the covariance of the random effects provides a descriptionof thepopulation that does not assume the parameters i are the same across groups. Estimating the randomeffects bi also gives a description of specific groups within the data.Model parameters do not have to be identified with individual effects. In general, design matrices A andB are used to identify parameters with linear combinations of fixed and random effects:ME=+ABbii

If the design matrices differ among groups, the model becomes

ME=+ABbii ii

If the design matrices also differ among observations, the model becomesij

MEij

ABb=+

ij i

ij

=+yf xij(, )ij MHijSome of the group-specific predictors in xij may not change with observation j. Calling those vi, themodel becomesyf xvijij

=+ij i MHij

Specifying Mixed-Effects Models

Suppose data for a nonlinear regression model falls into one of m distinct groups i = 1, ..., m.(Specifically, suppose that the groups are not nested.) To specify a general nonlinear mixed-effects

A covariance matrix for the random effects

2The error variance, assumed to be constant across observationsFor example, consider a model of the elimination of a drug from the bloodstream. The modelincorporates two overlapping phases: An initial phase p during which drug concentrations reach equilibrium with surrounding tissues Asecondphase q during which the drug is eliminated from the bloodstreamFor data on multiple individuals i, the model isrt

rt

yCe Cepipi ij qi ij Hij ,ij =+ +qi

where yij is the observed concentration in individual i at time tij. The model allows for differentsampling times and different numbers of observations for different individuals.The elimination rates rpi and rqi must be positive to be physically meaningful. Enforce this byintroducing the log rates Rpi=log(rpi) and Rqi=log(rqi)and reparametrizing the model:ij

yCeexp(Rt)exp(Rtpi

=+ +qi) ij Hqi

ij

Choosing which parameters to model with random effects is an important consideration when building amixed-effects model. One technique is to add random effects to all parameters, and use estimates of theirvariances to determine their significance in the model. An alternative is to fit the model separately toeach group, without random effects, and look at the variation of the parameter estimates. If an estimatevaries widely across groups, or if confidence intervals for each group have minimal overlap, theparameter is a good candidate for a random effect.To introduce fixed effects and random effects bi for all model parameters, reexpress the model asfollows:+ijp

=+ pi

[( )]exp[( RR Rt p ppi pij+

exp[ RRR Rt

CC Ce

[( )]qqi qqqi qij+ Hijexp(

bt)

=+ E()be+22iij +11i

()+ beexp(E +bt

33i44 iij

+ HijIn the notation of the general model:?EE

??bi1

?? yi1 ? ???

,,

yi = ? ? ? ? i ?? ? ? ?

?????ti1 ? E=????b? ? Xi =?,,? ? E4? ?bi4? ? yini ? ?tini ?where ni is the number of observations of individual i.Inthiscase,thedesign matrices Ai and Bi are, atleast initially, 4-by-4 identity matrices. Design matrices may be altered, as necessary, to introduceweighting of individual effects, or time dependency.Fitting the model and estimating the covariance matrix often leads to further refinements. A relativelysmall estimate for the variance of a random effect suggests that it can be removed from the model.Likewise, relatively small estimates for covariances among certain random effects suggests that a fullcovariance matrix is unnecessary. Since random effects are unobserved,must be estimated indirectly. Specifying a diagonal or block-diagonal covariance pattern for canimprove convergence and efficiency of the fitting algorithm.Statistics Toolbox functions nlmefit and nlmefitsa fit the general nonlinear mixed-effects model to data,estimating the fixed and random effects. The functions also estimate the covariance matrix for therandom effects. Additional diagnostic outputs allow you to assess tradeoffs between the number ofmodel parameters and the goodness of fit.

Choosing nlmefit or nlmefitsa

Statistics Toolbox provides two functions, nlmefit and nlmefitsa for fitting nonlinear mixed-effectsmodels. Each function provides different capabilities, which may help you decide which to use. Approximation Methods on page 9-226 Parameters Specific to nlmefitsa on page 9-227 Model and Data Requirements on page 9-228Approximation Methodsnlmefit provides the following four approximation methods for fitting nonlinear mixed-effects models: 'LME' Use the likelihood for the linear mixed-effects model at the current conditional estimates ofbeta and B. This is the default. 'RELME' Use the restricted likelihood for the linear mixed-effects model at the current conditionalestimates of beta and B. 'FO' First-order Laplacian approximation without random effects. 'FOCE' First-order Laplacian approximation at the conditional estimates of B.nlmefitsa provides an additional approximation method, Stochastic Approximation Expectation-

Maximization (SAEM) [24] with three steps :

1 Simulation: Generate simulated values of the random effects b from the posterior density p(b|) giventhe current parameter estimates.Stochastic approximation: Update the expected value of the log likelihood function by taking its valuefrom the previous step, and moving part way toward the average value of the log likelihood calculatedfrom the simulated random effects.2

Maximization step: Choose new parameter estimates to maximize the log likelihood function given thesimulated values of the random effects.3

Both nlmefit and nlmefitsa attempt to find parameter estimates to maximize a likelihood function, whichis difficult to compute. nlmefit deals with the problem by approximating the likelihood function invarious ways, and maximizing the approximate function. It uses traditional optimization techniques thatdepend on things like convergence criteria and iteration limits.nlmefitsa , on the other hand, simulates random values of the parameters in such a way that in the longrun they converge to the values that maximize the exact likelihood function. The results are random, andtraditional convergence tests dont apply. Therefore nlmefitsa provides options to plot the results as thesimulation progresses, and to restart the simulation multiple times. You can use these features to judgewhether the results have converged to the accuracy you desire.Parameters Specific to nlmefitsaThe following parameters are specific to nlmefitsa. Most control the stochastic algorithm. Cov0Initialvalueforthecovariancematrix PSI.Mustbean r-byr positive definite matrix. If empty, thedefault value depends on the values of BETA0. ComputeStdErrors true to compute standard errors for the coefficient estimates and store them inthe output STATS structure, or false (default) to omit this computation. LogLikMethod Specifies the method for approximating the log likelihood. NBurnIn Number of initial burn-in iterations during which the parameter estimates are notrecomputed. Default is 5. NIterations Controls how many iterations are performed for each of three phases of the algorithm. NMCMCIterations Number of Markov Chain Monte Carlo (MCMC) iterations.Model and Data RequirementsThere are some differences in the capabilities of nlmefit and nlmefitsa. Therefore some data and modelsare usable with either function, but some may require you to choose just one of them. Error models nlmefitsa supports a variety of error models. For example, the standard deviation ofthe response can be constant, proportional to the function value, or a combination of the two. nlmefit fitsmodels under the assumption that the standard deviation of the response is constant. One of the errormodels, 'exponential', specifies that the log of the response has a constant standard deviation. You can fitsuch models using nlmefit by providing the log response as input, and by rewriting the model function toproduce the log of the nonlinear function value.

Random effects Both functions fit data to a nonlinear function with parameters, and the parametersmay be simple scalar values or linear functions of covariates. nlmefit allows any coefficients of thelinear functions to have both fixed and random effects. nlmefitsa supports random effects only for theconstant (intercept) coefficient of the linear functions, but not for slope coefficients. So in the examplein Specifying Covariate Models on page 9-224, nlmefitsa can treat only the first three beta values asrandom effects. Model form nlmefit supports a very general model specification, with few restrictions on thedesign matrices that relate the fixed coefficients and the random effects to the model parameters.nlmefitsa is more restrictive:- The fixed effect design must be constant in every group (for every individual), so an observationdependent design is not supported.- The random effect design must be constant for the entire data set, so neither an observation-dependentdesign nor a group-dependent design is supported.- As mentioned under Random Effects, the random effect design must not specify random effects forslope coefficients. This implies that the design must consist of zeros and ones.- The random effect design must not use the same random effect for multiple coefficients, and cannotuse more than one random effect for any single coefficient.- The fixed effect design must not use the same coefficient for multiple parameters. This implies that itcan have at most one nonzero value in each column.If you want to use nlmefitsa for data in which the covariate effects are random, include the covariatesdirectly in the nonlinear model expression. Dont include the covariates in the fixed or random effectdesign matrices. Convergence As described in the Model form, nlmefit and nlmefitsa have different approaches tomeasuring convergence. nlmefit uses traditional optimization measures, and nlmefitsa providesdiagnostics to help you judge the convergence of a random simulation.In practice, nlmefitsa tends to be more robust, and less likely to fail on difficult problems. However,nlmefit may converge faster on problems where it converges at all. Some problems may benefit from acombined strategy, for example by running nlmefitsa for a while to get reasonable parameter estimates,and using those as a starting point for additional iterations using nlmefit.

Using Output Functions with Mixed-Effects Models

The Outputfcn field of the options structure specifies one or more functions that the solver calls aftereach iteration. Typically, you might use an output function to plot points at each iteration or to displayoptimization quantities from the algorithm. To set up an output function:1

Write the output function as a MATLAB file function or local function.

Use statset to set the value of Outputfcn to be a function handle, that is, the name of the functionpreceded by the @ sign. For example, if the output function is outfun.m, the command2

options = statset('OutputFcn', @outfun);

specifies OutputFcn to be the handle to outfun. To specify multiple output functions, use the syntax:options = statset('OutputFcn',{@outfun, @outfun2});Call the optimization function with options as an input argument. For an example of an outputfunction, see Sample Output Function on page 9-234.3

Structure of the Output Function

The function definition line of the output function has the following form:stop = outfun(beta,status,state)wherebeta is the current fixed effects.status is a structure containing data from the current iteration. Fields in status on page 9-230describes the structure in detail.state is the current state of the algorithm. States of the Algorithm on page 9-231 lists the possiblevalues. stop is a flag that is true or false depending on whether the optimization routine should quit orcontinue. See Stop Flag on page 9-232 for more information.The solver passes the values of the input arguments to outfun at each iteration.Fields in statusThe following table lists the fields of the status structure:FieldprocedureiterationDescription 'ALT' alternating algorithm for the optimization of the linear mixed effects or restricted linear mixedeffects approximations 'LAP' optimization of the Laplacian approximation for first order or first order conditionalestimation An integer starting from 0.Field Descriptioninner A structure describing the status of the inner iterations within the ALT and LAP procedures, withthe fields: procedure When procedure is 'ALT':- 'PNLS' (penalized nonlinear least squares)- 'LME' (linear mixed-effects estimation)- 'none'When procedure is 'LAP',- 'PNLS' (penalized nonlinear least squares)- 'PLM' (profiled likelihood maximization)- 'none'

state one of the following:

- 'init'- 'iter'- 'done'- 'none' iteration an integer starting from 0, or NaN.For nlmefitsa with burn-in iterations, the output functionis called after each of those iterations with a negative value for STATUS.iteration.fval The current log likelihoodPsi The current random-effects covariance matrix theta The current parameterization of Psimse The current error varianceStates of the AlgorithmThefollowingtableliststhepossiblevaluesfor state:state Description'init' The algorithm is in the initial state before the first iteration.'iter' The algorithm is at the end of an iteration. 'done' The algorithm is in the final state after the lastiteration.The following code illustrates how the output function might use the value of state to decide which tasksto perform at the current iteration:switch statecase 'iter'% Make updates to plot or guis as needed case 'init'% Setup for plots or guiscase 'done'% Cleanup of plots, guis, or final plot otherwiseendStop FlagThe output argument stop is a flag that is true or false. The flag tells the solver whether it should quit orcontinue. The following examples show typical ways to use the stop flag.Stopping an Optimization Based on Intermediate Results. The output function can stop theestimation at any iteration based on the values of arguments passed into it. For example, the followingcode sets stop to true based on the value of the log likelihood stored in the 'fval'field of the statusstructure:stop = outfun(beta,status,state)stop = false;% Check if loglikelihood is more than 132. if status.fval > -132stop = true;end

or updates a plot with the fixed-effects (BETA) and variance of the random effects (diag(STATUS.Psi)).For nlmefit, the plot also includes the log-likelihood (STATUS.fval).nlmefitoutputfcn is the default output function for nlmefitsa.Touseit with nlmefit, specify a functionhandle for it in the options structure:opt = statset('OutputFcn', @nlmefitoutputfcn, ) beta = nlmefit( , 'Options', opt, )To prevent nlmefitsa from using of this function, specify an empty value for the output function:opt = statset('OutputFcn', [], ) beta = nlmefitsa( , 'Options', opt, )nlmefitoutputfcn stops nlmefit or nlmefitsa if you close the figure that it produces.

Mixed-Effects Models Using nlmefit and nlmefitsa

The following example also works with nlmefitsa in place of nlmefit.The data in indomethacin.mat records concentrations of the drug indomethacin in the bloodstream of sixsubjects over eight hours:load indomethacingscatter(time,concentration,subject) xlabel('Time (hours)')ylabel('Concentration (mcg/ml)') title('{\bf Indomethacin Elimination}') hold on

Specifying Mixed-EffectsModels on page 9-221 discusses a useful model for this type of data. Construct the model via ananonymous function as follows:model = @(phi,t)(phi(1)*exp(-exp(phi(2))*t) + ...phi(3)*exp(-exp(phi(4))*t));Use the nlinfit function to fit the model to all of the data, ignoring subject-specific effects:

title('{\bf Indomethacin Elimination}')

PHI gives estimates of the four model parameters for each of the six subjects. The estimates varyconsiderably, but taken as a 24-parameter model of the data, the mean-squared error of 0.0057 is asignificant reduction from 0.0304 in the original four-parameter model.A box plot of residuals by subject shows that the larger model accounts for most of the subject-specificeffects:h = boxplot(RES,'colors',colors,'symbol','o');set(h(~isnan(h)),'LineWidth',2)hold onboxplot(RES,'colors','k','symbol','ko')grid onxlabel('Subject')ylabel('Residual')hold off

The spread of the residuals

(the vertical scale of the box plot) is much smaller than in the previous box plot, and the boxes are nowmostly centered on zero.While the 24-parameter model successfully accounts for variations due to the specific subjects in thestudy, it does not consider the subjects as representatives of a larger population. The samplingdistribution from which the subjects are drawn is likely more interesting than the sample itself. Thepurpose of mixed-effects models is to account for subject-specific variations more broadly, as randomeffects varying around population means.Use the nlmefit function to fit a mixed-effects model to the data.The following anonymous function, nlme_model, adapts the four-parameter model used by nlinfit to thecalling syntax of nlmefit by allowing separate parameters for each individual. By default, nlmefit assignsrandom effects to all the model parameters. Also by default, nlmefit assumes a diagonal covariancematrix (no covariance among the random effects) to avoid overparametrization and related convergenceissues.nlme_model = @(PHI,t)(PHI(:,1).*exp(-exp(PHI(:,2)).*t) + ... PHI(:,3).*exp(-exp(PHI(:,4)).*t));phi0 =[12 11];[phi,PSI,stats] = nlmefit(time,concentration,subject, ...[],nlme_model,phi0)phi =2.82770.77290.4606-1.3459

Plot the residuals using the plotResiduals function:

The upper probability plots look straight, meaning the residuals are normally distributed. The bottomhistogram plots match the superimposed normal density plot. So you can conclude that the error modelmatches the data.For comparison, fit the data using a constant error model, instead of the proportional model thatcreated the data:6

Not only are the upper probability plots not straight, but the histogram plot is quite skewed compared tothe superimposed normal density. These residuals are not normally distributed, and do not match themodel.

But the nonlinear model can also be transformed to a linear one by taking the log on both sides, to getlog(y) = log(p1) + p2*x. Thats tempting, because we can fit that linear model by ordinary linear leastsquares. The coefficients wed get from a linear least squares would be log(p1) and p2.paramEstsLin = [ones(size(x)), x] \ log(y); paramEstsLin(1) = exp(paramEstsLin(1))paramEstsLin =11.9312-0.4462How did we do? We can superimpose the fit on the data to find out.xx = linspace(min(x), max(x)); yyLin = modelFun(paramEstsLin, xx);plot(x,y,'o', xx,yyLin,'-');xlabel('x'); ylabel('y');legend({'Raw data','Linear fit on the log scale'},'location','NE');

Something seems to have gone wrong, because the fit doesnt really follow the trend that we can see inthe raw data. What kind of fit would we get if we used nlinfit to do nonlinear least squares instead?Well use the previous fit as a rough starting point, even though its not a great fit.paramEsts = nlinfit(x, y, modelFun, paramEstsLin)paramEsts =8.8145-0.2885

The fit using nlinfit more or less passes through the center of the data point scatter. A residual plotshows something approximately like an even scatter about zero.r = y-modelFun(paramEsts,x);plot(x,r,'+', [min(x) max(x)],[0 0],'k:');xlabel('x'); ylabel('residuals');So what went wrong with the linear fit? The problem is in log transform. If we plot the data and the twofits on the log scale, we can see that theres an extreme outlier.plot(x,log(y),'o', xx,log(yyLin),'-', xx,log(yy),'-'); xlabel('x'); ylabel('log(y)');ylim([-5,3]);legend({'Raw data', 'Linear fit on the log scale', ...'Nonlinear fit on the original scale'},'location','SW');That observation is not an outlier in the original data, so what happened to make it one on the log scale?The log transform is exactly the right thing to straighten out the trend line. But the log is a very

nonlinear transform, and so symmetric measurement errors on the original scale have becomeasymmetric on the log scale. Notice that the outlier had the smallest y value on the original scale -- closeto zero. The log transform has "stretched out" that smallest y value more than its neighbors. We madethe linear fit on the log scale, and so it is very much affected by that outlier.Hadthemeasurementatthatonepointbeenslightlydifferent,thetwofits might have been much more similar.For example,y(11) = 1;paramEsts = nlinfit(x, y, modelFun, [10;-.3])paramEsts =8.7618-0.2833paramEstsLin = [ones(size(x)), x] \ log(y); paramEstsLin(1) = exp(paramEstsLin(1))paramEstsLin =9.6357-0.3394yy = modelFun(paramEsts,xx);yyLin = modelFun(paramEstsLin, xx);plot(x,y,'o', xx,yyLin,'-', xx,yy,'-');xlabel('x'); ylabel('y');legend({'Raw data', 'Linear fit on the log scale', ...'Nonlinear fit on the original scale'},'location','NE');Still, the two fits are different. Which one is "right"? To answer that, suppose that instead of additivemeasurement errors, measurements of y were affected by multiplicative errors. These errors would notbe symmetric, and least squares on the original scale would not be appropriate. On the other hand, thelog transform would make the errors symmetric on the log scale, and the linear least squares fit on thatscale is appropriate.So, which method is "right" depends on what assumptions you are willing to make about your data. Inpractice, when the noise term is small relative to the trend, the log transform is "locally linear" in thesense that y values near the same x value will not be stretched out too asymmetrically. In that case, thetwo methods lead to essentially the same fit. But when the noise term is not small, you should considerwhat assumptions are realistic, and choose an appropriate fitting method.