Accessing these slides

--

View them online:

???

This talk will focus on introducing the new sl3 R package, which provides a
modern implementation of the Super Learner algorithm [@vdl2007super], a method
for performing stacked regressions [@breiman1996stacked], combined with
covariate screening and cross-validation.

class: inverse, center, middle

Core sl3 Design Principles

sl3 Architecture

All of the classes defined in sl3 are based on the R6 framework, which brings
a newer object-oriented paradigm to the R language.

Core classes

sl3_Task: Define ML problem (task). Keep track of data, as well as the variables. Created by make_sl3_Task().

--

Lrnr_base: Base class for defining ML algorithms. Save the
fits on particular sl3_Tasks. Different learning
algorithms are defined in classes that inherit from this class.

--

Pipeline: Define a sequential pipe of learners. The fit of one learner is used by the next one.

Generate default predictions using predict() method:

preds <- lrnr_glm_fit$predict()
head(preds)

--

Generate predictions for a given new task:

preds <- lrnr_glm_fit$predict(task)
head(preds)

Learners IV: Properties

Learners have properties that indicate what features they support. Use sl3_list_properties() to get a list of all properties supported by at least one learner.

sl3_list_learners(c("binomial", "offset"))

--

Use sl3_list_learners() to find learners supporting any set of properties:

sl3_list_learners(c("binomial", "offset"))

Learners V: Tuning Parameters

Learners can be instantiated without providing any additional parameters. We tried to provide sensible defaults for each learner.

You can modify the learners' behavior by instantiating learners with different parameters.

--

sl3 Learners support some common parameters (where applicable):

covariates: subsets covariates before fitting. Allows
learners to be fit to the same task with different covariate subsets.

outcome_type: overrides the task outcome_type. Allows
learners to be fit to the same task with different outcome_types.

...: arbitrary parameters can be passed directly to the learner
method. See documentation for each learner.

Compatibility with SuperLearner Package

Defining a sl3 learner that uses the SL.glmnet wrapper from SuperLearner:

lrnr_sl_glmnet <- make_learner(Lrnr_pkg_SuperLearner, "SL.glmnet")

???

In most cases, using wrappers from SuperLearner will not be as efficient as
their native sl3 counterparts. If your favorite learner is missing from
sl3, please consider adding it by following the "Defining New Learners"
vignette.

Dependent Data / Time-series

sl3 supports univariate and multivariate time-series.

Using "bsds" example dataset, we can make arbitrary size forecasts using one of the "time-series" learners:

We can plot the network of tasks required to train this Super Learner:

Delayed II

Delayed III

shiny demo

delayed then allows us to parallelize the procedure across these tasks using
the future package.

n.b., This feature is currently experimental and hasn't yet been throughly
tested on a range of parallel back-ends.

--

Performance comparisons can be found in the "SuperLearner Benchmarks" vignette
that accompanies this package.

???

Fitting a Super Learner is composed of many different training and prediction
steps, as the procedure requires that the learners in the stack and the
meta-learner be fit on cross-validation folds and on the full data.

For more information on specifying futureplans for parallelization, see
the documentation of the future
package.

class: center, middle

Thanks!

We have a great team: Jeremy Coyle, Nima Hejazi, Ivana Malenica, Oleg Sofrygin.