Looking to dive deeper into the more cutting edge machine learning use cases in Apache Spark? To successfully use Spark’s advanced analytics capabilities including large scale machine learning and graph analysis, check out The Data Scientist’s Guide to Apache Spark, from Databricks.

Interested in learning the concepts behind Logistic Regression (LogR)? Looking for a concise introduction to LogR? This article is for you. Includes a Python implementation and links to an R script as well.

During the recent RStudio Conference, an attendee asked the panel about the lack of support provided by the tidyverse in relation to time series data. As someone who has spent the majority of their career on time series problems, this was somewhat surprising because R already has a great suite of tools for visualizing, manipulating, and modeling time series data. I can understand the desire for a ‘tidyverse approved’ tool for time series analysis, but it seemed like perhaps the issue was a lack of familiarity with the available toolage. Therefore, I wanted to put together a list of the packages and tools that I use most frequently in my work. For those unfamiliar with time series analysis, this could a good place to start investigating R’s current capabilities.

Black-box models, like random forest model or gradient boosting model, are commonly used in predictive modelling due to their elasticity and high accuracy. The problem is, that it is hard to understand how a single variable affects model predictions. As a remedy one can use excellent tools like pdp package (Brandon Greenwell, pdp: An R Package for Constructing Partial Dependence Plots, The R Journal 9(2017)) or ALEPlot package (Apley, Dan. Visualizing the Effects of Predictor Variables in Black Box Supervised Learning Models (2016)). OR Now one can use the DALEX package to not only plot a conditional model response but also superimpose responses from different models to better understand differences between models.