Visualization

Generation and plotting functions

mlr's visualization capabilities rely on generation functions which generate data for
plots, and plotting functions which plot this output using either ggplot2 or ggvis
(the latter being currently experimental).

This separation allows users to easily make custom visualizations by
taking advantage of the generation functions. The only data transformation that is handled inside
plotting functions is reshaping. The reshaped data is also accessible by calling the plotting
functions and then extracting the data from the ggplot object.

The functions are named accordingly.

Names of generation functions start with generate and are followed by a title-case description of
their FunctionPurpose, followed by Data, i.e., generateFunctionPurposeData.
These functions output objects of class FunctionPurposeData.

Plotting functions are prefixed by plot followed by their purpose, i.e., plotFunctionPurpose.

Some examples

In the example below we create a plot of classifier performance as function of the decision
threshold for the binary classification problem sonar.task.
The generation function generateThreshVsPerfData creates an object of class
ThreshVsPerfData which contains the data for the plot in slot
$data.

Note that by default the Measurenames are used to annotate the panels.

fpr$name
#> [1] "False positive rate"
fpr$id
#> [1] "fpr"

This does not only apply to plotThreshVsPerf, but to other plot functions that
show performance measures as well, for example plotLearningCurve.
You can use the ids instead of the names by setting pretty.names = FALSE.

Customizing plots

As mentioned above it is easily possible to customize the built-in plots or making your
own visualizations from scratch based on the generated data.

What will probably come up most often is changing labels and annotations.
Generally, this can be done by manipulating the ggplot object,
in this example the object returned by plotThreshVsPerf, using the usual ggplot2
functions like ylab or labeller.
Moreover, you can change the underlying data, either d$data (resulting from
generateThreshVsPerfData) or the possibly reshaped data contained in the
ggplot object (resulting from plotThreshVsPerf), most often by
renaming columns or factor levels.

Below are two examples of how to alter the axis and panel labels of the above plot.

Imagine you want to change the order of the panels and also are not satisfied with the
panel names, for example you find that "Mean misclassification error" is too long and you
prefer "Error rate" instead. Moreover, you want the error rate to be displayed first.

Using the labeller function requires calling
facet_wrap (or facet_grid), which can be
useful if you want to change how the panels are positioned (number of rows and columns)
or influence the axis limits.

The decoupling of generation and plotting functions is especially practical if you
prefer traditional graphics or lattice. Here is a lattice plot which gives a
result similar to that of plotThreshVsPerf.