Effect Of Alpha On Lasso Regression

20 Dec 2017

Often we want conduct a process called regularization, wherein we penalize the number of features in a model in order to only keep the most important features. This can be particularly important when you have a dataset with 100,000+ features.

Lasso regression is a common modeling technique to do regularization. The math behind it is pretty interesting, but practically, what you need to know is that Lasso regression comes with a parameter, alpha, and the higher the alpha, the most feature coefficients are zero.

That is, when alpha is 0, Lasso regression produces the same coefficients as a linear regression. When alpha is very very large, all coefficients are zero.

In this tutorial, I run three lasso regressions, with varying levels of alpha, and show the resulting effect on the coefficients.

Load Data

Run Three Lasso Regressions, Varying Alpha Levels

# Create a function called lasso,deflasso(alphas):'''
Takes in a list of alphas. Outputs a dataframe containing the coefficients of lasso regressions from each alpha.
'''# Create an empty data framedf=pd.DataFrame()# Create a column of feature namesdf['Feature Name']=names# For each alpha value in the list of alpha values,foralphainalphas:# Create a lasso regression with that alpha value,lasso=Lasso(alpha=alpha)# Fit the lasso regressionlasso.fit(X,Y)# Create a column name for that alpha valuecolumn_name='Alpha = %f'%alpha# Create a column of coefficient valuesdf[column_name]=lasso.coef_# Return the datafram returndf

# Run the function called, Lassolasso([.0001,.5,10])

Feature Name

Alpha = 0.000100

Alpha = 0.500000

Alpha = 10.000000

0

CRIM

-0.920130

-0.106977

-0.0

1

ZN

1.080498

0.000000

0.0

2

INDUS

0.142027

-0.000000

-0.0

3

CHAS

0.682235

0.397399

0.0

4

NOX

-2.059250

-0.000000

-0.0

5

RM

2.670814

2.973323

0.0

6

AGE

0.020680

-0.000000

-0.0

7

DIS

-3.104070

-0.169378

0.0

8

RAD

2.656950

-0.000000

-0.0

9

TAX

-2.074110

-0.000000

-0.0

10

PTRATIO

-2.061921

-1.599574

-0.0

11

B

0.856553

0.545715

0.0

12

LSTAT

-3.748470

-3.668884

-0.0

Notice that as the alpha value increases, more features have a coefficient of 0.