- [Instructor] The next element,and something you can count on happening,is that your models are going to degrade.It's inevitable.If the model remains static,but the world is changing around it,the model's ability to make accuratepredictions is going to degrade.But what can you do about it?Well this is a potentially large subjectbut there's a number of thingsthat we can take as basic principles.First a brief mention.

The cross industry standard process for data mining,which we're going to discuss in the last chapter,has six phases that end with deployment.But over the years many have proposedthat it would be useful to have a potentialseventh phase called Monitoring.And in a sense this is really the same topic.So one thing you have to concern yourself withis that on a routine basis,possibly even nightly,some models even do this in near real time,you have to double check to make sure that the weight(so the coefficients as we call them)are kept accurate.

Let me give you a tangible example.For instance this is some data fromthe famousTitanic accidentwherezero,shown in blue,are the folks that died.Or the one shown in red,are the passengers that survived.Notice down in what's labeled as Node 14 in the lower left.The survival rate for those that wereless than or equal to 13 years of ageis 63%.

Well that cutoff of less than or equal to13 years of age could change over time.It could get older and become 14 or 15.It could become younger and be 11.This is the notion of recalibrating andmost statistical software and predictiveanalytic software will actually allow youto recalibrate.Not always the term they use but allow you torecalibrate these models on a regular basis.Over time our models are going to requiremore attention than just that.

Eventually we'll want to remodel more thoroughly.Perhaps we want to consider new variablesor different algorithms or different modeling techniques.So essentially what we're doing is we're still keepingthe data preparation steps that we've put in place before.But we're allowing the algorithms todo their work again.Not simply recalibrating those values.So again this could include using variables thatwe didn't consider the last time and so on.

Eventually what's going to happenis your sources of data are going to change.You're going to transition to a new data warehouse.You're going to open up new lines of business.You're going to have competitors that you didn't have before.You're going to have sources of data,perhaps unstructured data or social media data,that somehow are being incorporated.Things have changed so much that you really have togo through all the phases of chris-dee-em again.That's a very different operation.

One hopes that this only happens aboutonce every three to five years.But eventually you'll even have to do that.

Resume Transcript Auto-Scroll

Author

Released

7/10/2017

A proper predictive analytics and data-mining project can involve many people and many weeks. There are also many potential errors to avoid. A "big picture" perspective is necessary to keep the project on track. This course provides that perspective through the lens of a veteran practitioner who has completed dozens of real-world projects. Keith McCormick is an independent data miner and author who specializes in predictive models and segmentation analysis, including classification trees, cluster analysis, and association rules. Here he shares his knowledge with you. Walk through each step of a typical project, from defining the problem and gathering the data and resources, to putting the solution into practice. Keith also provides an overview of CRISP-DM (the de facto data-mining methodology) and the nine laws of data mining, which will keep you focused on strategy and business value.