Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

13.
c|c
(TM)
(TM)
13
calculation | consulting why deep learning works
any local minima will do; the ground state is a state of overtraining
good generalization
overtraining
Early Stopping: to avoid the ground state ?

15.
c|c
(TM)
Current Interpretation
(TM)
15
calculation | consulting why deep learning works
•ﬁnding the ground state is easy (sic); generalizing is hard
•ﬁnding the ground state is irrelevant: any local minima will do
•the ground state is a state over training

25.
c|c
(TM)
REM: a toy model for real Glasses
(TM)
25
calculation | consulting why deep learning works
but it is believed that entropy collapse ‘drives’ the glass transition
the glass transition is not well understood

27.
c|c
(TM)
what is a real (structural) Glass ?
(TM)
27
calculation | consulting why deep learning works
all liquids can be made into glasses
if we cool then fast enough
the glass transition is not a normal phase transition
not the melting point
arrangement of atoms is amorphous; not completely random
different cooling rates produce different glassy states
universal phenomena; not universal physics
molecular details affect the thermodynamics

29.
c|c
(TM)
REM: Dynamics on the Energy Landscape
(TM)
29
calculation | consulting why deep learning works
let us assume some states trap the solver for some time;
of course, there is a great effort to design solvers that can avoid traps

30.
c|c
(TM)
Energy Landscapes: and Protein Folding
(TM)
30
calculation | consulting why deep learning works
let us assume some states trap the solver in state E(j) for a short time
and the transitions E(j) -> E(j-1) are governed by ﬁnite, reversible transitions
(i.e. SGD oscillates back and forth for a while)
classic result(s): for T near the glass Temp (Tc)
the traversal times are slower than exponential !
in a physical system, like a protein or polymer,
it would take longer than the known lifetime of the universe
to ﬁnd the ground (folded) state

31.
c|c
(TM)
Protein Folding: the Levinthal Paradox
(TM)
31
calculation | consulting why deep learning works
folding could take longer than the known lifetime of the universe ?

32.
c|c
(TM)
(TM)
32
calculation | consulting why deep learning works
http://arxiv.org/pdf/cond-mat/9904060v2.pdf
Old analogy between Protein folding and Hopﬁeld Associative Memories
Natural pattern recognition could
• use a mechanism with a glass Temp (Tc) that is as low as possible
• avoid the glass transition entirely, via energetics
Nature (i.e. folding) can not operate this way !
Protein Folding: around the Levinthal Paradox