A Blog for Risk Analysts and Programmers

Category: Topics in Data Science

Artificial Intelligence (AI) and Data Science continue their progression towards becoming mainstream and ubiquitous. This is a very exciting time for scientists, model developers, programmers, and a lot of other technically inclined professionals. But to be honest it can be confusing and overwhelming at times. We all hear terms like “AI”, “Data Science”, “Big Data”, “Machine Learning”, “Statistical Learning”, “Data Mining”, “Deep Learning”, etc., and it’s often hard to make sense of it all even for those of us who have been writing code to implement statistical models for decades. But it seems these terms are being used among people in every field and every industry. How do remote sensing professionals use data from a satellite to create land cover maps? how do certain streaming services determine what shows or movies to recommend based on your watching habits? How did Cambridge Analytica determine the poor shmucks Donald Trump should focus on? The answers to all these questions lay in machine learning algorithms. (If interested you can find more information on the differences or definitions of all the terms mentioned above on various discussion threads on social sites like Quora, StackExchange, LinkedIn, and KDNuggets among others.)

This article will be a little more focused on the question: how can we use machine learning in areas where statistics have traditionally been employed in credit risk?

This blog introduces my R package, RTransprob. The RTransprob package contains a set of functions used to automate commonly used methods to estimate migration matrices used in credit risk analysis. This includes methods for estimating migration and default rates based on the duration and cohort methods, bootstrapping default rates and forecasting/stress testing credit exposures migrations, via Econometrics and a couple of Machine Learning algorithms.

So, you’ve written code in R which contains somewhat complicated loops. The execution time is not quite as fast as you hoped for. You turn to using the profvis package in RStudio (or Rprof) to profile the R program, in the hopes of finding the places in your code that are causing the bottleneck. The profiler returns a few areas that you focus on to make more efficient, but unfortunately no matter how many ‘loops’ you jump through, you can’t seem to reduce the execution time.

Next, you spend at least a couple of frustrating hours trying to figure out how to vectorize (think: higher-level programming to improve efficiency) the loops creating the bottleneck, to no avail. And it’s okay to admit it, we’ve all been there.

STOP!!!! The solution may be to rewrite some of your key functions in C++.Read More »

Model Risk quantification can be a tricky concept to grasp. But when we consider that models are nothing more than abstractions of real life situations, it’s easier to see how there are risks associated with models. Even when models perform exceptionally well in recreating said real life scenario.Read More »