Strategy for building a “good” predictive model

By Ian Morton. Ian worked in credit risk for big banks for a number of years. He learnt about how to (and how not to) build “good” statistical models in the form of scorecards using the SAS Language.

Read original post and similar articles here. I thing Ian's list below is a good starting point. I would add a few steps such as deployment, maintenance at the end, and gathering requirements, understanding goal and success metrics at the top.

Initial investigations

1. Look at the data dictionary to see which data is available

2. What is the outcome ? is it yes / no ? is it continuous ? 3. Decide upon the model required (logistic ! for yes / no outcome)

Getting the data ready

4 cross tabulations on categorical variables to understand the coding and volumes

You need to be a member of AnalyticBridge to add comments!

Thanks . An excellent post for those of us who did not know about Ian Morton's work.

We see in Linked-In / Data Science Group posts that debates on causation vs correlation and whether i) any cause-effect relationships matter at all and that ii) the preponderance of observed association data in Terra bytes / Petabytes are indeed explaining the process away and hence are sufficient.

Thanks for your post. In relation to your additions "deployment, maintenance at the end, and gathering requirements, understanding goal and success metrics at the top". Yes, you are of course correct. I see your additions as the important wraparound to my suggestions.