2 ESTIMATION OF GAUSSIAN RANDOM VECTORS The Conditional Mean and Covariance for Gaussian Random Vectors Two random vectors x and z that are jointly normally (Gaussian) distributed The estimate of the random variable x in terms of z according to the minimum mean square error (MMSE) criterion the MMSE estimator is the conditional mean of x given z. 2

6 The optimal estimator (in the MMSE sense) of x in terms of z is a linear function of z. This is a consequence of the Gaussian assumption conditional covariance, measures the quality of the estimate,is independent of the observation z. The MMSE estimate the conditional mean of a Gaussian random vector in terms of another Gaussian random vector (the measurement) is a linear combination of The prior (unconditional) mean of the variable to be estimated; The difference between the measurement and its prior mean. 6

7 LINEAR MINIMUM MEAN SQUARE ERROR ESTIMATION The Principle of Orthogonality (MMSE) estimate of a random variable x in terms of another random variable z is the conditional mean E[x z] In many problems the distributional information needed for the evaluation of the conditional mean is not available. Furthermore, even if it were available, the evaluation of the conditional mean could be prohibitively complicated A method that (1) is simple yields the estimate as a linear function of the observation(s) and (2) requires little information only first and second moments, is highly desirable. Such a method, called linear MMSE estimation, relies on the principle of orthogonality 7

8 The best linear estimate (in the sense of MMSE) of a random variable in terms of another random variable the observation(s) is such that 1. The estimate is unbiased the estimation error has mean zero, and 2. The estimation error is uncorrelated from the observation(s); that is, they are orthogonal. Linear MMSE Estimation for Zero-Mean Random Variables in terms of a (normed linear) space of random variables The set of real-valued scalar zero-mean random variables z i, i = 1,..., n, can be considered as vectors in an abstract vector space or linear space 8

9 A (complete) vector space, in which one defines an inner product, is a Hilbert space Random variables under consideration are zero mean (correlation!) satisfies the properties of a norm and can be taken as such. With this definition of the norm, linear dependence is defined by stating that the norm of a linear combination of vectors is zero If, then z 1 is a linear combination of z 2,..., z m that is, it is an element of the subspace spanned by z 2,..., z m 9

10 Two vectors are orthogonal, denoted as z i z k, if and only if which is equivalent to these zero-mean random variables being uncorrelated The linear MMSE estimator of a zero-mean random variable x in terms of z i, i = 1,..., n, is given by and has to be such that the norm of the estimation error is minimum The linear MMSE estimate is denoted also by a circumflex ( hat ), even though it is not the conditional mean. 10

11 Thus the norm of the estimation error will have to be minimized with respect to β i, i = 1,..., n. is seen to be equivalent to requiring the following orthogonality property: This is the principle of orthogonality: In order for the error to have minimum norm, it has to be orthogonal to the observations. This is equivalent to stating that the estimate of x has to be the orthogonal projection of x into the space spanned by the observations 11

13 Linear MMSE Estimation for Nonzero-Mean Random Variables For a random variable x with nonzero mean estimator is of the form, the best linear Since the MSE is the sum of the square of the mean and the variance in order to minimize it, the estimate should have the unbiasedness property 13

14 The error corresponding to this estimate is This has transformed the nonzero-mean case into the zero-mean case. The orthogonality principle then yields the coefficients β i from The estimator is also known as the Best Linear Unbiased Estimator BLUE 14

15 Linear MMSE Estimation for Vector Random Variables Vector-valued random variables x and z, which are not necessarily Gaussian or zero-mean. The best linear estimate of x in terms of z The criterion for best is the MMSE: find the estimator that minimizes the scalar MSE criterion, the expected value of the squared norm of the estimation error The linear MMSE estimator is such that the estimation error is zero-mean (the estimate is unbiased) and orthogonal to the observation z 15

16 The estimate is the orthogonal projection of the vector x into the space spanned by the observation vector z The orthogonality requirement is, in the multidimensional case, that each component of be orthogonal to each component of z. the weighting matrix A the linear MMSE estimator for the multidimensional case is identical to the conditional mean from the Gaussian case 16

17 The matrix MSE corresponding is given by And is identical expression to the conditional covariance in the Gaussian case (strictly speaking, the matrix MSE is not a covariance matrix since the estimate is not the conditional mean) Equations above are the fundamental equations of linear estimation 17

18 Remarks Note the distinction between the scalar MSE criterion, an inner product, and the matrix MSE, an outer product. The matrix MSE is sometimes called, with abuse of language, a covariance matrix. From the above derivations it follows that the best estimator (in the MMSE sense) for Gaussian random variables is identical to the best linear estimator for arbitrarily distributed random variables with the same first- and second-order moments. The linear estimator is the overall best if the random variables are Gaussian; otherwise, it is only the best within the class of linear estimators 18

19 Linear MMSE Estimation Summary The linear MMSE estimator of one random vector x in terms of another random vector z is such that the estimation error is 1. Zero-mean (the estimate is unbiased) 2. Uncorrelated from the measurements These two properties imply that the error is orthogonal to the measurements. The principle of orthogonality. The expression of the linear MMSE estimator is identical to the expression of the conditional mean of Gaussian random vectors if they have the same first two moments. Similarly, the matrix MSE associated with the LMMSE estimator has the same expression as the conditional covariance in the Gaussian case. The linear MMSE estimator is 1. The overall best if the random variables are Gaussian 2. The best within the class of linear estimators otherwise 19

20 LEAST SQUARES ESTIMATION The Batch LS Estimation In the linear least squares (LS) problem it is desired to estimate the n x vector x, modeled as an unknown constant, from the linear observations (n z -vectors) to minimize the quadratic error 20

21 The L S estimator that minimizes J is obtained by setting its gradient with respect to x to zero. assuming the required inverse exists It can be easily shown that since R k is positive definite, the Hessian with respect to x is positive definite, and consequently the extremum point is a minimum. A batch estimator the entire data have to be processed simultaneously for every k. LS estimator is unbiased, because 21

22 The estimation error is the covariance matrix of the LS estimator The existence of the inverse of H R 1 H required is equivalent to having the covariance of the error finite. This amounts to requiring the parameter x to be observable 22

23 Relationship to the Maximum Likelihood (ML) Estimator If the measurement errors w(i) are independent Gaussian random variables with mean zero and covariance R(i), then minimizing the LS criterion is equivalent to maximizing the likelihood function the LS and ML estimators coincide, LS is clearly a disguised ML technique 23

24 The Recursive LS Estimator In this case, k is interpreted as discrete time. 24

25 The information is additive here because of the following: 1. The problem is static the parameter is fixed. 2. The observations are modeled as independent. 25

26 Alternative Expression for the Gain 26

27 The Recursion for the Estimate The above is the recursive parameter estimate updating equation the recursive LS estimator, written as 27

28 The new (updated) estimate is therefore equal to the previous one plus a correction term. This correction term consists of the gain W(k+1) multiplying the residual the difference between the observation z(k+1) and the predicted value of this observation. Since this is a recursive scheme, initialization is required, for example by using a batch technique on a small number of initial measurements or by using an a priori initial estimate and an associated covariance. The Residual Covariance, S Covariance of the residual, the difference (zero mean) between the observation (noise R) and predicted observation based on estimate of x (covariance P), which are independent. 28

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

DEFINITION: A vector space is a nonempty set V of objects, called vectors, on which are defined two operations, called addition and multiplication by scalars (real numbers), subject to the following axioms

C H A P T E R 8 Estimation with Minimum Mean Square Error INTRODUCTION A recurring theme in this text and in much of communication, control and signal processing is that of making systematic estimates,

Regression Analysis Prof. Soumen Maity Department of Mathematics Indian Institute of Technology, Kharagpur Lecture - 2 Simple Linear Regression Hi, this is my second lecture in module one and on simple

Linear Regression CS434 A regression problem We want to learn to predict a person s height based on his/her knee height and/or arm span This is useful for patients who are bed bound and cannot stand to

Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

Orthogonal Complements and Projections Recall that two vectors in are perpendicular or orthogonal provided that their dot product vanishes That is, if and only if Example 1 The vectors in are orthogonal

Instrumental Variables (IV) Instrumental Variables (IV) is a method of estimation that is widely used in many economic applications when correlation between the explanatory variables and the error term

MATHEMATICAL METHODS OF STATISTICS By HARALD CRAMER TROFESSOK IN THE UNIVERSITY OF STOCKHOLM Princeton PRINCETON UNIVERSITY PRESS 1946 TABLE OF CONTENTS. First Part. MATHEMATICAL INTRODUCTION. CHAPTERS

2DI36 Statistics 2DI36 Part II (Chapter 7 of MR) What Have we Done so Far? Last time we introduced the concept of a dataset and seen how we can represent it in various ways But, how did this dataset came

Simple Linear Regression Chapter 11 Rationale Frequently decision-making situations require modeling of relationships among business variables. For instance, the amount of sale of a product may be related

The Bivariate Normal Distribution This is Section 4.7 of the st edition (2002) of the book Introduction to Probability, by D. P. Bertsekas and J. N. Tsitsiklis. The material in this section was not included

Least-Squares Intersection of Lines Johannes Traa - UIUC 2013 This write-up derives the least-squares solution for the intersection of lines. In the general case, a set of lines will not intersect at a

1. Systems of linear equations We are interested in the solutions to systems of linear equations. A linear equation is of the form 3x 5y + 2z + w = 3. The key thing is that we don t multiply the variables

Matrix algebra January 20 Introduction Basics The mathematics of multiple regression revolves around ordering and keeping track of large arrays of numbers and solving systems of equations The mathematical

OLS in Matrix Form 1 The True Model Let X be an n k matrix where we have observations on k independent variables for n observations Since our model will usually contain a constant term, one of the columns

Block designs 1 Background In a typical experiment, we have a set Ω of experimental units or plots, and (after some preparation) we make a measurement on each plot (for example, the yield of the plot).

The University of Chicago, Booth School of Business Business 41202, Spring Quarter 2013, Mr. Ruey S. Tsay Solutions to Final Exam Problem A: (40 points) Answer briefly the following questions. 1. Describe

Gregory Carey, 1998 General Linear Model - 1 The General Linear Model: Theory 1.0 Introduction In the discussion of multiple regression, we used the following equation to express the linear model for a

Math 571 Inner Product Spaces 1. Preliminaries An inner product space is a vector space V along with a function, called an inner product which associates each pair of vectors u, v with a scalar u, v, and

MATH 304 Linear Algebra Lecture 18: Rank and nullity of a matrix. Nullspace Let A = (a ij ) be an m n matrix. Definition. The nullspace of the matrix A, denoted N(A), is the set of all n-dimensional column

LECTURE 5 Hypothesis Testing in the Classical Regression Model The Normal Distribution and the Sampling Distributions It is often appropriate to assume that the elements of the disturbance vector ε within

Lecture 5: Hypothesis Testing What we know now: OLS is not only unbiased it is also the most precise (efficient) unbiased estimation technique - ie the estimator has the smallest variance (if the Gauss-Markov

Appendix A Some probability and statistics A Probabilities, random variables and their distribution We summarize a few of the basic concepts of random variables, usually denoted by capital letters, X,Y,

MATH 55 - APPLIED MATRIX THEORY FINAL TEST: SAMPLE with SOLUTIONS (25 points NAME: PROBLEM (3 points A web of 5 pages is described by a directed graph whose matrix is given by A Do the following ( points

MAT 171 8.5 Solving Linear Systems Using Matrices and Row Operations A. Introduction to Matrices Identifying the Size and Entries of a Matrix B. The Augmented Matrix of a System of Equations Forming Augmented

Summary of Probability Mathematical Physics I Rules of Probability The probability of an event is called P(A), which is a positive number less than or equal to 1. The total probability for all possible

Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

Applications to Data Smoothing and Image Processing I MA 348 Kurt Bryan Signals and Images Let t denote time and consider a signal a(t) on some time interval, say t. We ll assume that the signal a(t) is

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS 1. SYSTEMS OF EQUATIONS AND MATRICES 1.1. Representation of a linear system. The general system of m equations in n unknowns can be written a 11 x 1 + a 12 x 2 +

Mixture Models Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Clustering by Mixture Models General bacground on clustering Example method: -means Mixture model based

1 Hypothesis testing A statistical test is a method of making a decision about one hypothesis (the null hypothesis in comparison with another one (the alternative using a sample of observations of known

Chapter 3 Linear Codes An important class of codes are linear codes in the vector space Fq n, where F q is a finite field of order q. Definition 3.1 (Linear code). A linear code C is a code in Fq n for

Math 31 Homework 1 Solutions Last modified: July 15, 01 This homework is due on Thursday, July 1th, 01 at 1:10pm Please turn it in during class, or in my mailbox in the main math office (next to 4W1) Please

Chapter 16 Roundoff Noise in IIR Digital Filters It will not be possible in this brief chapter to discuss all forms of IIR (infinite impulse response) digital filters and how quantization takes place in