Figure 1.

Performance. Results of the performance test of the EM method on simulated mutation data. Each
simulated tumor belongs to one of up to three different cancer types, where in each
type there are five independent processes active. The total number of processes per
data set is thus 5 (blue), 10 (red) or 15 (yellow). Here, we show the scaling of different
observables as a function of sample size (calculated over 50 replicates for each combination
of M and n). (A) The number of processes present in the data is determined via the BIC. Shown are the
median (line), the smallest and the largest (shaded area) number of inferred processes.
(B) The correlation between the real and the inferred mutation spectra (the difference
from 1 is plotted). (C) The time until completion of the inference program scales approximately linearly with
M (for constant n; the fits above correspond to 0.93, 0.99 and 1.02). (B and C show the median with
the 10% and 90% quantiles.). BIC: Bayesian information criterion; EM: expectation-maximization.