Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, PMLR 84:1397-1406, 2018.

Abstract

We develop a new theoretical framework to analyze the generalization error of deep learning, and derive a new fast learning rate for two representative algorithms: empirical risk minimization and Bayesian deep learning. The series of theoretical analyses of deep learning has revealed its high expressive power and universal approximation capability. Our point of view is to deal with the ordinary finite dimensional deep neural network as a finite approximation of the infinite dimensional one. Our formulation of the infinite dimensional model naturally defines a reproducing kernel Hilbert space corresponding to each layer. The approximation error is evaluated by the degree of freedom of the reproducing kernel Hilbert space in each layer. We derive the generalization error bound of both of empirical risk minimization and Bayesian deep learning and it is shown that there appears bias-variance trade-off in terms of the number of parameters of the finite dimensional approximation. We show that the optimal width of the internal layers can be determined through the degree of freedom and derive the optimal convergence rate that is faster than $O(1/\sqrt{n})$ rate which has been shown in the existing studies.

Related Material

@InProceedings{pmlr-v84-suzuki18a,
title = {Fast generalization error bound of deep learning from a kernel perspective},
author = {Taiji Suzuki},
booktitle = {Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics},
pages = {1397--1406},
year = {2018},
editor = {Amos Storkey and Fernando Perez-Cruz},
volume = {84},
series = {Proceedings of Machine Learning Research},
address = {Playa Blanca, Lanzarote, Canary Islands},
month = {09--11 Apr},
publisher = {PMLR},
pdf = {http://proceedings.mlr.press/v84/suzuki18a/suzuki18a.pdf},
url = {http://proceedings.mlr.press/v84/suzuki18a.html},
abstract = {We develop a new theoretical framework to analyze the generalization error of deep learning, and derive a new fast learning rate for two representative algorithms: empirical risk minimization and Bayesian deep learning. The series of theoretical analyses of deep learning has revealed its high expressive power and universal approximation capability. Our point of view is to deal with the ordinary finite dimensional deep neural network as a finite approximation of the infinite dimensional one. Our formulation of the infinite dimensional model naturally defines a reproducing kernel Hilbert space corresponding to each layer. The approximation error is evaluated by the degree of freedom of the reproducing kernel Hilbert space in each layer. We derive the generalization error bound of both of empirical risk minimization and Bayesian deep learning and it is shown that there appears bias-variance trade-off in terms of the number of parameters of the finite dimensional approximation. We show that the optimal width of the internal layers can be determined through the degree of freedom and derive the optimal convergence rate that is faster than $O(1/\sqrt{n})$ rate which has been shown in the existing studies.}
}

%0 Conference Paper
%T Fast generalization error bound of deep learning from a kernel perspective
%A Taiji Suzuki
%B Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2018
%E Amos Storkey
%E Fernando Perez-Cruz
%F pmlr-v84-suzuki18a
%I PMLR
%J Proceedings of Machine Learning Research
%P 1397--1406
%U http://proceedings.mlr.press
%V 84
%W PMLR
%X We develop a new theoretical framework to analyze the generalization error of deep learning, and derive a new fast learning rate for two representative algorithms: empirical risk minimization and Bayesian deep learning. The series of theoretical analyses of deep learning has revealed its high expressive power and universal approximation capability. Our point of view is to deal with the ordinary finite dimensional deep neural network as a finite approximation of the infinite dimensional one. Our formulation of the infinite dimensional model naturally defines a reproducing kernel Hilbert space corresponding to each layer. The approximation error is evaluated by the degree of freedom of the reproducing kernel Hilbert space in each layer. We derive the generalization error bound of both of empirical risk minimization and Bayesian deep learning and it is shown that there appears bias-variance trade-off in terms of the number of parameters of the finite dimensional approximation. We show that the optimal width of the internal layers can be determined through the degree of freedom and derive the optimal convergence rate that is faster than $O(1/\sqrt{n})$ rate which has been shown in the existing studies.

Suzuki, T.. (2018). Fast generalization error bound of deep learning from a kernel perspective. Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, in PMLR 84:1397-1406