Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Even in the era of Big Data there are many real-world problems where the number of input features has about the some order of magnitude than the number of samples. Often many of those input features are irrelevant and thus inferring the relevant ones is an important problem in order to prevent over-fitting. Automatic Relevance Determination solves this problem by applying Bayesian techniques.

15.
15
Consider each 𝛼𝑖, 𝜆𝑖 defining a model ℋ𝑖 𝛼, 𝜆 .
Yes! That means we can use
our Bayesian Interpolation to
find 𝒘, 𝜶, 𝝀 with the highest
evidence!
This is the idea behind BayesianRidge as found in sklearn.linear_model

16.
Consider that each weight has an individual variance, so that
𝑝 𝒘 𝝀 ~𝒩 0, Λ−1 ,
where Λ = diag(𝜆1, … , 𝜆 𝐻), 𝜆ℎ ∈ ℝ+.
Now, our minimization problem is:
min
𝒘
𝛼 Φ𝒘 − 𝒕 2 + 𝒘 𝑡Λ𝒘
16
Pruning: If precision 𝜆ℎ of feature ℎ is high, its weight 𝑤ℎ is very likely to
be close to zero and is therefore pruned.
This is called Sparse Bayesian Learning or Automatic Relevance
Determination. Found as ARDRegression under sklearn.linear_model.

17.
Crossvalidation can be used for the estimation of hyperparmeters but suffers from
the curse of dimensionality (inappropriate for low-statistics).
17
Source: Peter Ellerton, http://pactiss.org/2011/11/02/bayesian-inference-homo-bayesianis/