Abstract [en]

The kernel-based regularization method has two core issues: kernel design and hyperparameter estimation. In this paper, we focus on the second issue and study the properties of several hyperparameter estimators including the empirical Bayes (EB) estimator, two Steins unbiased risk estimators (SURE) (one related to impulse response reconstruction and the other related to output prediction) and their corresponding Oracle counterparts, with an emphasis on the asymptotic properties of these hyperparameter estimators. To this goal, we first derive and then rewrite the first order optimality conditions of these hyperparameter estimators, leading to several insights on these hyperparameter estimators. Then we show that as the number of data goes to infinity, the two SUREs converge to the best hyperparameter minimizing the corresponding mean square error, respectively, while the more widely used EB estimator converges to another best hyperparameter minimizing the expectation of the EB estimation criterion. This indicates that the two SUREs are asymptotically optimal in the corresponding MSE senses but the EB estimator is not. Surprisingly, the convergence rate of two SUREs is slower than that of the EB estimator, and moreover, unlike the two SUREs, the EB estimator is independent of the convergence rate of Phi(T)Phi/N to its limit, where Phi is the regression matrix and N is the number of data. A Monte Carlo simulation is provided to demonstrate the theoretical results. (C) 2018 Elsevier Ltd. All rights reserved.

Abstract [en]

The first order stable spline (SS-1) kernel (also known as the tunedcorrelated kernel) is used extensively in regularized system identification, where the impulse response is modeled as a zero-mean Gaussian process whose covariance function is given by well designed and tuned kernels. In this paper, we discuss the maximum entropy properties of this kernel. In particular, we formulate the exact maximum entropy problem solved by the SS-1 kernel without Gaussian and uniform sampling assumptions. Under general sampling assumption, we also derive the special structure of the SS-1 kernel (e.g. its tridiagonal inverse and factorization have closed form expression), also giving to it a maximum entropy covariance completion interpretation.

Abstract [en]

Inspired by ideas taken from the machine learning literature, new regularization techniques have been recently introduced in linear system identification. In particular, all the adopted estimators solve a regularized least squares problem, differing in the nature of the penalty term assigned to the impulse response. Popular choices include atomic and nuclear norms (applied to Hankel matrices) as well as norms induced by the so called stable spline kernels. In this paper, a comparative study of estimators based on these different types of regularizers is reported. Our findings reveal that stable spline kernels outperform approaches based on atomic and nuclear norms since they suitably embed information on impulse response stability and smoothness. This point is illustrated using the Bayesian interpretation of regularization. We also design a new class of regularizers defined by "integral" versions of stable spline/TC kernels. Under quite realistic experimental conditions, the new estimators outperform classical prediction error methods also when the latter are equipped with an oracle for model order selection. (C) 2016 Elsevier Ltd. All rights reserved.

Abstract [en]

An established method for grey-box identification is to use maximum-likelihood estimation for the nonlinear case implemented via extended Kalman filtering. In applications of (nonlinear) model predictive control a more and more common approach for the state estimation is to use moving horizon estimation, which employs (nonlinear) optimization directly on a model for a whole batch of data. This paper shows that, in the linear case, horizon estimation may also be used for joint parameter estimation and state estimation, as long as a bias correction based on the Kalman filter is included. For the nonlinear case two special cases are presented where the bias correction can be determined without approximation. A procedure how to approximate the bias correction for general nonlinear systems is also outlined. (C) 2015 Elsevier Ltd. All rights reserved.

Abstract [en]

Most of the currently used techniques for linear system identification are based on classical estimation paradigms coming from mathematical statistics. In particular, maximum likelihood and prediction error methods represent the mainstream approaches to identification of linear dynamic systems, with a long history of theoretical and algorithmic contributions. Parallel to this, in the machine learning community alternative techniques have been developed. Until recently, there has been little contact between these two worlds. The first aim of this survey is to make accessible to the control community the key mathematical tools and concepts as well as the computational aspects underpinning these learning techniques. In particular, we focus on kernel-based regularization and its connections with reproducing kernel Hilbert spaces and Bayesian estimation of Gaussian processes. The second aim is to demonstrate that learning techniques tailored to the specific features of dynamic systems may outperform conventional parametric approaches for identification of stable linear systems.

Sastry, S. Shankar

Abstract [en]

Anomaly detection in large populations is a challenging but highly relevant problem. It is essentially a multi-hypothesis problem, with a hypothesis for every division of the systems into normal and anomalous systems. The number of hypothesis grows rapidly with the number of systems and approximate solutions become a necessity for any problem of practical interest. In this paper we take an optimization approach to this multi-hypothesis problem. It is first shown to be equivalent to a non-convex combinatorial optimization problem and then is relaxed to a convex optimization problem that can be solved distributively on the systems and that stays computationally tractable as the number of systems increase. An interesting property of the proposed method is that it can under certain conditions be shown to give exactly the same result as the combinatorial multi-hypothesis problem and the relaxation is hence tight.

Pillonetto, Gianluigi

Abstract [en]

Model estimation and structure detection with short data records are two issues that receive increasing interests in System Identification. In this paper, a multiple kernel-based regularization method is proposed to handle those issues. Multiple kernels are conic combinations of fixed kernels suitable for impulse response estimation, and equip the kernel-based regularization method with three features. First, multiple kernels can better capture complicated dynamics than single kernels. Second, the estimation of their weights by maximizing the marginal likelihood favors sparse optimal weights, which enables this method to tackle various structure detection problems, e. g., the sparse dynamic network identification and the segmentation of linear systems. Third, the marginal likelihood maximization problem is a difference of convex programming problem. It is thus possible to find a locally optimal solution efficiently by using a majorization minimization algorithm and an interior point method where the cost of a single interior-point iteration grows linearly in the number of fixed kernels. Monte Carlo simulations show that the locally optimal solutions lead to good performance for randomly generated starting points.

Abstract [en]

System Identification is about estimating models of dynamical systems from measured input-output data. Its traditional foundation is basic statistical techniques, such as maximum likelihood estimation and asymptotic analysis of bias and variance and the like. Maximum likelihood estimation relies on minimization of criterion functions that typically are non-convex, and may cause numerical search problems. Recent interest in identification algorithms has focused on techniques that are centered around convex formulations. This is partly the result of developments in machine learning and statistical learning theory. The development concerns issues of regularization for sparsity and for better tuned bias/variance trade-offs. It also involves the use of subspace methods as well as nuclear norms as proxies to rank constraints. A quite different route to convexity is to use algebraic techniques manipulate the model parameterizations. This article will illustrate all this recent development.

Ninness, Brett

Abstract [en]

This paper develops and illustrates a new maximum-likelihood based method for the identification of Hammerstein-Wiener model structures. A central aspect is that a very general situation is considered wherein multivariable data, non-invertible Hammerstein and Wiener nonlinearities, and colored stochastic disturbances both before and after the Wiener nonlinearity are all catered for. The method developed here addresses the blind Wiener estimation problem as a special case.

Abstract [en]

This paper proposes a general convex framework for the identification of switched linear systems. The proposed framework uses over-parameterization to avoid solving the otherwise combinatorially forbidding identification problem, and takes the form of a least-squares problem with a sum-of-norms regularization, a generalization of the ℓ1-regularization. The regularization constant regulates the complexity and is used to trade off the fit and the number of submodels.

National Category

Identifiers

Funder

Swedish Research Council

Note

Funding Agencies|Swedish foundation for strategic research in the center MOVIII||Swedish Research Council in the Linnaeus center CADICS||European Research Council|267381|Sweden-America Foundation||Swedish Science Foundation||