- [Instructor] In this movie,I will explain the different choices we haveas we approach our regression modeling.First, I have to give you a bunch of informationbefore we dive into modeling.I need to explain to you that we essentially have a choiceof three philosophical approaches:forward stepwise, backward stepwise and ambidirectional.Even though what I'm going to demonstrate to youin this course is forward stepwise,I'll explain what the others meanand why I didn't choose them for this course.

So in regression approaches,there are three main ways people model:forward stepwise, backward stepwiseand ambidirectional stepwise.So what does that literally mean?Remember your challenge, you have an exposure,you have an outcomeand then you have a grab bag of confounders.How do you decide which confounders fit in thereand which ones should not be included?If you do forward stepwise,you do that by running models one at a timeand each time either adding a new variableor taking out a variable.

For example, if you try a variable and it does not fit,you throw it away and try a different variable next model.If it fits, you keep itand add a new variable in the next model.It's like you are going forward building a model.Then there is backward stepwise which I don't likeand we are not doing.In that situation, your first model would haveevery single covariate in it.Then each model, you'd remove the one that fit the worstuntil you get to a model where they all fit.

Then finally we have the watery, vague,loosey goosey idea of ambidirectional stepwise,meaning both directions.You put some in one iteration,you take some out the next iteration and it's art.So how popular are these approaches?I am a forward stepwise fanand I really have no one on my side I'll admit.Everyone seems to want backward stepwiseand I can't figure out why.There are a lot of issues with backward stepwise.First, if you try to put everything in the first model,it can break the software due to small cells.

Also, it's much harder to decidewhich of the variables to take out.In forward stepwise, you add one and if you don't like it,you take it out.But if the model starts with all this clutter,which one do you want to eliminate before the next round?It's hard to decide.At the end of forward stepwise modeling,I really have a feel for the data,but I don't really get a feel for the datafrom backward stepwise.Theoretically, the data are the data.Whether you start with forward or with backward,you should meet the same model in the middlegiven the covariates you have.

So it doesn't matter to your final product,it just matters for the process.And I find the forward stepwise process both easier to useand easier to document.Now for the dirty secret about ambidirectional.Actually, both forward and backward stepwiseis a little ambidirectional.That's because in forward stepwise,once I get to what I think is my final model,I try to run models to put back the covariate I took outto see if they fit now.It's kind of like sayingyou can't fit a scarf in your suitcaseso don't pack it, but then after you pack everything else,you think you can shove it in the sideso you try to shove the scarf back into the suitcase.

It's like that.I think I have a good fitting model,but I try to shove the covariate back inthat I just kicked out just to make sure they don't fitbecause it would be niceto have the scarf with you on vacation, right?In backward stepwise, we technically get ambidirectional toobecause those people also try to do the same thing I do.Shove the covariates that didn't survivethe modeling process back in one at a timejust to make sure they don't fit.So this is the dirty secret.At the end of the day, to finalize a model,the forward stepwise and the backward stepwiseget a little ambidirectional,but we really aren't allowed to say that.

In the method section, we either did forward stepwiseor we did backward stepwise.We don't need to admit we got ambidirectional at the end.So this movie talked about our three main approaches:forward stepwise, backward stepwise and ambidirectional.I talked to you about the problems with choosingbackward stepwise in my experience,even though it's more popularly talked aboutin the scientific press.There are just logistical problems with startingwith a model loaded up with all your covariatesand trying to Jenga them out as I sayin reference to the popular party game.

Instead, I prefer forward stepwisebecause it helps me understand how I arrived at my modelbecause I built it up from a few variablesand kept adding more variables.I'm just more psychologically comfortablewith that approach.Theoretically, all analysts should be able to meetin the middle with a well-defended modelwhether they go backwards or forwards.And as I said,people say ambidirectional,but all that alludes to is the fact thatat the end of modeling,sometimes forward people do backward stepsand backward people do forward steps.It's like dancing.

You're supposed to have funand not get hung up on these details.So let's get modeling.

Resume Transcript Auto-Scroll

Author

Released

1/19/2017

Linear and logistic regression models can be created using R, the open-source statistical computing software. In this course, biotech expert and epidemiologist Monika Wahi uses the publicly available Behavioral Risk Factor Surveillance Survey (BRFSS) dataset to show you how to perform a forward stepwise modeling process. Monika shows you how to design your research by considering scientific plausibility selecting a hypothesis. Then, she takes you through the steps of preparing, developing, and finalizing both a linear regression model and a logistic regression model. She also shares techniques for how to interpret diagnostic plots, improve model fit, compare models, and more.