The World Bank provides open data for many indicators across most countries, spanning the last few decades.

This data is available online with searches available by country codes (iso2c and iso3c), indicator names, and by dates. The indicators can be viewed here. It can also be accessed via an application programming interface (API). The WDI library in R provides access through this API, allowing for easy search and retrieval of data.

In this post, written as an R-markdown file, and available on RPubs andGitHub, I showcase the WDI library by looking at maternal mortality rates for the United States, Brazil, and South Africa.

Logistic regression is a statistical test that uses independent variables (categorical or numerical) to predict a categorical dependent variable. It is based on the principles of linear regression. As the outcome (dependent) variable is categorical, though, logistic regression computes the probability of this variable.

There are many methods of creating and testing the validity of a logistic regression model. In the link is a web page with an explanation of binomial logistic regression and how to use the R programming language to construct and understand your model.

I note more and more published papers on machine learning. As a clinician, I find it a fascinating way of looking at patient data. In case you are not familiar with machine learning, the definition given over at Wikipedia is: Machine learning is the subfield of computer science that gives computers the ability to learn without being explicitly programmed. …machine learning explores the study and construction of algorithms that can learn from and make predictions on data.

That is exactly what machine learning is used for in medicine as well. In a particular branch of machine learning, called supervised learning, a dataset of predictor variables together with a known outcome variable can be passed to the machine, which in turns constructs a model from the data. A selection of the data is usually kept separately and is used to test the model. Given that the outcomes are know, it is trivial to calculate the accuracy of the model. Once a model is generated, data without a known outcome can be passed to the model, which will predict the outcome. This can indeed be very useful in medicine.

There are many tools available to do machine learning. I use both Python and Mathematica. It is really easy to do. I have put together a short video on YouTube for those familiar with Mathematica, just to show how easy it is.