The idea that many classes of signals can be represented by linear combination of a small set of atoms of a dictionary has had a great impact on various signal processing applications, e.g., image compression, super resolution imaging and robust face recognition. For practical problems such a sparsifying dictionary is usually unknown ahead of time, and many heuristics have been proposed to learn an efficient dictionary from the given data. However, there is little theory explaining the empirical success of these heuristic methods. In this work, we prove that under mild conditions, the dictionary learning problem is actually locally well-posed: the desired solution is a local optimum of the $\ell_1$-norm minimization problem. More precisely, let $\mb A \in \Re^{m \times n}$ be an incoherent (and possibly overcomplete, i.e., $m < n$) dictionary, the coefficients $\mb X \in \Re^{n \times p}$ follow a random sparse model, and $\mb Y = \mb A \mb X$ be the observed data; then with high probability $(\mb A,\mb X)$ is a local optimum of the $\ell_1$-minimization problem:
\begin{equation*}\label{mainopt}
\displaystyle\mathop{\mathrm{minimize}}_{\mb A',\mb X'} \; \| \mb X' \|_1 \quad \text{s.t.} \quad \mb Y = \mb A' \mb X', \; \| \mb A'_i \|_2 = 1 \; \forall \, i,
\end{equation*}
provided the number of samples $p = \Omega(n^3 k)$. This is the first result showing that the dictionary learning problem is locally solvable for an overcomplete dictionary.
Our analysis draws on tools developed for the matrix completion problem. In particular, inspired by David Gross's golfing scheme, we derive relaxed optimality
conditions and construct dual variables to certify the local optimality conditions.