Transcription

1 Shalabh & Christian Heumann Simultaneous Prediction of Actual and Average Values of Study Variable Using Stein-rule Estimators Technical Report Number 104, 2011 Department of Statistics University of Munich

2 Simultaneous Prediction of Actual and Average Values of Study variable Using Stein-rule Estimators Shalabh Department of Mathematics & Statistics Indian Institute of Technology, Kanpur (INDIA) Christian Heumann Institute of Statistics University of Munich, Munich (GERMANY) Abstract The simultaneous prediction of average and actual values of study variable in a linear regression model is considered in this paper. Generally, either of the ordinary least squares estimator or Stein-rule estimators are employed for the construction of predictors for the simultaneous prediction. A linear combination of ordinary least squares and Stein-rule predictors are utilized in this paper to construct an improved predictors. Their efficiency properties are derived using the small disturbance asymptotic theory and dominance conditions for the superiority of predictors over each other are analyzed. 1

3 1 Introduction: Traditionally the predictions from a linear regression model are made either for the actual values of study variable or for the average values. However, this may not be the case in many practical situations and one may be required to predict both the actual and average values of the study variable simultaneously; see, e.g. Rao et al. (2008), Shalabh (1995), and Zellner (1994). As an illustrative example, consider a new drug that promotes the duration of sleep in human beings. The manufacturer of such a drug will be more interested in knowing the average increase in the sleep duration by a specific dose, for example, in designing an advertisement or sale campaign and somewhat less interested in the actual increase of sleep duration. On the other strand, a user may be more interested in knowing the actual increase in sleep duration rather than the average duration. Suppose the statistician utilizes the theory of regression analysis for prediction. It is expected from the statistician to safeguard the interest of both the manufacturer and user who are interested in the prediction of the average and actual increase, respectively, although they may assign varying weight to prediction of actual and average increases of sleep attributable to the specific dose of new drug. The classical theory of prediction can predict either the actual value or the average value of study variable but not simultaneously. In view of the importance of simultaneous prediction of actual and average values of study variable in a linear regression model, Shalabh (1995), see also Rao et al. (2008), has presented a framework for the simultaneous prediction of actual and average values of study variable. Shalabh (1995) has examined the efficiency properties of predictions arising from least squares and Stein-rule estimation procedures. The work on the issue of simultaneous prediction has been extended in various directions from various perspectives in different models in the literature. Toutenburg and Shalabh (2000), Shalabh and Chandra (2002), 2

4 and Dube and Manocha (2002) analyzed the simultaneous prediction in restricted regression model, Chaturvedi and Singh (2000) and Chaturvedi et al. (2008) employed Stein-rule estimators for simultaneous prediction; Chaturvedi et al. (2002) discussed the issue of simultaneous prediction in a multivariate set up with an unknown covariance matrix of disturbance vector; Shalabh et al. (2008) considered the simultaneous prediction in measurement error models etc. In all such works, either the ordinary least squares (OLS) predictor or the Stein-rule (SR) predictor are utilized for predicting the actual and average values of study variable. They provide more efficient predictions under certain conditions depending on whether they are used for actual or average value predictions. So a natural question arises that can we utilize the good properties of the two predictors and obtain an improved estimator? Based on this, we have utilized the OLS and SR predictors together and have proposed two predictors in this paper. Their efficiency properties are derived and analyzed. The small disturbance asymptotic theory is utilized to derive the efficiency properties and dominance conditions for the superiority of predictors over each other are derived and analyzed. The plan of this paper is as follows. Section 2 provides these predictions and presents their motivation. Their properties are analyzed in Sections 3 and 4 employing the small disturbance asymptotic theory. Some concluding remarks are placed in Section 5. Finally, derivation of main results is outlined in Appendix. 2 Model Specification And Predictions: Let us postulate the following linear regression model: y = Xβ + u (2.1) where y is a n 1 vector of n observations on the study variable, X is a n p matrix of n observations on p(> 2) explanatory variables, β is a p 1 vector of p regression 3

5 coefficients and u is a n 1 vector of disturbances following a multivariate normal distribution with mean vector 0 and variance covariance matrix σ 2 I n. rank. It is assumed that the scalar σ 2 is unknown and the matrix X has full column When the simultaneous prediction of average values A v = E(y) and actual values A c = y within the simple is to be considered, we may define our target function as T = λa v + (1 λ)a c (2.2) where λ is a nonstochastic scalar between 0 and 1; see Shalabh (1995), Rao et al. (2008). The value of λ may reflect the weight being given to the prediction of average values in relation to the prediction of actual values. The least squares estimator of β is given by b L = (X X) 1 X y (2.3) which is the best linear unbiased estimator of β. Sometimes the properties like linearity and unbiasedness may not be desirable. Under such situation, it may be possible to obtain an estimator with reduced variability by relaxing the properties of linearity and unbiasedness. The family of Stein-rule estimators gives rise to such estimators. The Stein-rule estimator of β is defined by 2(p 2)k Hy b S = 1 (n p + 2).y b y L (2.4) Hy where H = X(X X) 1 X, H = (I H) and k is any positive nonstochastic scalar; see, e.g., Judge and Bock (1978), Saleh(2006). Based on (2.3) and (2.4), predictions for the values of the study variable are obtained by Xb L and Xb S which can be used for both the average values A v = E(y) as well as actual values A c = y. 4

6 For both the components A v and A c of T defined by (2.2), we may use Xb L so that the vector of predictions for T is given by T LL = Xb L. (2.5) Similarly, if we employ Xb S for both A v and A c, we find the vector of predictions as T SS = Xb s 2(p 2)k Hy = 1 (n p + 2).y Xb y L. (2.6) Hy On the other hand, if we use Xb L for A c and Xb S for A v in T, we get the vector of predictions as T SL = λxb S + (1 λ)xb L = 1 2(p 2)λk Hy (n p + 2).y y Hy b L. (2.7) Similarly, utilizing Xb L for A v and Xb S for A c in T, we find yet another vector of predictions T LS = λxb L + (1 λ)xb S = 1 2(p 2)(1 λ)k. y Hy (n p + 2) y Hy b L. (2.8) Our motivation underlying the formulation of (2.7) and (2.8) is as follows. If we compare Xb L and Xb S with respect to the criterion of total mean squared error, it is well known that Xb L is superior to Xb S for all positive values of k when the aim is to predict A c (the actual values of study variable ). When the aim is to predict A v (the average values of study variable), Xb S is superior to Xb L for positive values of k below one. Thus if we use superior predictions, i.e., Xb L for A c and Xb S for A v in T defined by (2.3), we get T SL. Conversely, if we consider predictions, i.e. Xb L for A V and Xb S for A c, it leads to T LS. 5

8 whence the following family of predictions can be defined: P fg = 1 2(p 2)g fk Hy (n p + 2).y X y f b L (2.16) Hy where 0 g f 1 is a nonstochastic scalar characterizing the predictions. If we set the value of g as 0,1,λ and (1 λ), we obtain (2.12), (2.13), (2.14) and (2.15) respectively as special cases. 3 Asymptotic Efficiency Properties of Predictions Within The Sample: It is easy to see that the predictions based on least squares are weakly unbiased in the sense that E(T LL T ) = 0. (3.1) Further, the second order moment matrix is E(T LL T )(T LL T ) = σ 2 λ 2 I n + (1 2λ) H. (3.2) Similar exact expressions for the bias vector and second order moment matrix of P g for any nonzero value of g can be derived following, for instance, Judge and Bock (1978) but they would be sufficiently intricate and would not permit to deduce any clear inference regarding the efficiency properties. We therefore propose to consider their asymptotic approximations employing the small disturbance asymptotic theory. Theorem I: The asymptotic approximation for the bias vector of P g for nonzero values of g to order O(σ 2 ) is B(P g ) = E(P g T ) 2(n p)(p 2)gk = σ 2 Xβ (3.3) (n p + 2)β X Xβ 7

9 while the difference between the second order moment matrices of P g and P o T LL to order O(σ 4 ) is given by D(P g ; P o ) = E(P o T )(P o T ) E(P g T )(P g T ) 4(n p)(p 2)gk = σ 4 XCX (3.4) (n p + 2)β X Xβ where C = λ(x X) 1 2λ + (p 2)gk ββ. (3.5) β X Xβ These results are derived in Appendix. From (3.3), we observe that P g is not weakly unbiased. However, if we define the norm of bias vector to the order of our approximation as 4(n p) B(P g ) B(P g ) = σ 4 2 (p 2) 2 g 2 k 2 (n p + 2) 2 β X Xβ (3.6) then with respect to the criterion of such a norm, P g is superior than P g for g less than g. In particular, both T SL and T LS are better than T SS for positive λ. Further, T SL is superior or inferior than T LS when λ is less or greater than 0.5. When λ = 0.5, i.e., equal weight is assigned to the prediction of actual and average values of study variable, both T LS and T SL are equally good. Next, let us compare the predictions with respect to the criterion of second order moment matrix to order O(σ 4 ). For this purpose, we utilize the following two results for any p 1 vector a and p p positive definite matrix A. Result I: The matrix (A 1 aa ) is positive definite if only only if a Aa is less than 1; see, e.g., Yancey, Judge and Bock (1974) for proof. Result II: The matrix (aa A 1 ) cannot be non-negative definite for p > 1; see, e.g., Gulkey and Price (1981). Applying Result I to matrix C given by (3.5), we observe that it cannot be positive definite whence it follows from (3.4) that P g cannot be superior to P o 8

10 with respect to the criterion of second order moment matrix to the order of our approximation. Similarly, using Result II, we find that the matrix C cannot be non-negative definite by virtue of our specification that p exceeds 2. In other words, P o cannot be superior to P g. It is thus seen that P g neither dominates P o nor is dominated by P o according to second order moment matrix criterion. For the comparison of P g and P g, we observe from (3.4) that D(P g ; P g ) = E(P g T )(P g T ) E(P g T )(P g T ) 4 4(n p)(p 2)gk = σ (n p + 2)β XXβ (g g ) λ(x X) 1 2λ + (g + g )k ββ. (3.7) β X Xβ Applying Result I and Result II, once again we find no clear dominance of P g over P g. Let us now compare the predictions with respect to the criterion of trace of second order moment matrix to order O(σ 4 ). From (3.4), we see that trd(p g ; P o ) = σ 4 4(n p)(p 2)2 gk (λ gk) (3.8) (n p + 2)β XXβ which is positive when k < λ g. (3.9) Thus P 1 T SS, P λ T SL and P 1 λ T LS are better than P o T LL when k is less than λ, 1 and (1 λ) respectively. Just the reverse is true, i.e., T LL beats T SS, T SL and T LS for k exceeding λ, 1 and (1 λ) respectively which holds true at least so long as k exceeds 1. 9

13 Taking the criterion to be the norm of bias vector to the order of our approximation, as in (3.6), it is observed that both T fsl and T fls are better than T fss. Further, T fsl is better than T fls when λ is less than 0.5. The reverse is true, i.e., T fls is better than T fsl when λ exceeds 0.5. If we choose the criterion to be second order moment matrix to order O(σ 4 ) and use the Results I and II stated in preceding Section, it is found that neither P fg is better than P fo nor vice-versa. Finally, let us take the criterion as trace of second order moment matrix to order O(σ 4 ). Proceeding in the same manner as indicated in the preceding Section, we can easily find the conditions for the superiority one over the other. These are assembled in Table 2. First we observe from (4.4) that 4(n p)(p 2)gf kβ X trd(p fg ; P fo ) = σ 4 f X fβ (n p + 2)(β X Xβ) 2 β X Xβ β X f X fβ tr(x X) 1 X fx f 2 (p 2)g f k. (4.6) The expression on the right hand side is positive when 1 β X Xβ k < (p 2)g f β X f X fβ tr(x X) 1 X fx f 2 (4.7) provided that the quantity in the square brackets is positive. If we define q 1 = q p = ( ) 1 1 p α i 1 p 2 α 1 i=2 ( ) p α i 1 p 2 α p i=1 (4.8) (4.9) with α 1 α 2... α p denoting the eigenvalues of X f X f in the metric of X X, 12

17 5 Some Concluding Remarks: If we take the performance criterion to be total mean squared error, it is wellknown that least squares predictions are better than Stein-rule predictions for the actual values of study variables while the opposite is true, i.e., Stein-rule predictions under some mild constraints are better than the least squares predictions for average values of study variable. This observation has prompted us to present two predictions when the objective is to predict both the actual and average values simultaneously. The proposed predictions are based like Stein-rule predictions. However, if we look at the norms of bias vectors to the order of our approximation, both are found to be superior to Stein-rule predictions. Next, we have compared the predictions according to the criterion of second order moment matrix to the order of our approximation and have found that none of the four predictions is uniformly superior to the other. Finally, taking the criterion as trace of second order moment matrix, we have deduced conditions for the superiority of one over the other and have presented them in a tabular form. These conditions are elegant and easy to apply in developing efficient predictions. It may be remarked that our investigations can be easy extended on the lines of Ullah, Srivastava and Chandra (1983) to the case when the disturbances are not necessarily normally distributed. Appendix In order to find small disturbance asymptotic approximations for the bias vectors and mean squared error matrices, we replace u in (2.1) by σv so that v has a multivariate normal distribution with mean vector 0 and variance covariance 16

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

Math 20 Chapter 5 Eigenvalues and Eigenvectors Eigenvalues and Eigenvectors. Definition: A scalar λ is called an eigenvalue of the n n matrix A is there is a nontrivial solution x of Ax = λx. Such an x

Matrix algebra January 20 Introduction Basics The mathematics of multiple regression revolves around ordering and keeping track of large arrays of numbers and solving systems of equations The mathematical

Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components The eigenvalues and eigenvectors of a square matrix play a key role in some important operations in statistics. In particular, they

Block designs 1 Background In a typical experiment, we have a set Ω of experimental units or plots, and (after some preparation) we make a measurement on each plot (for example, the yield of the plot).

Chapter 6 Regression Analysis Under Linear Restrictions and Preliminary Test Estimation One of the basic objectives in any statistical modeling is to find good estimators of the parameters. In the context

Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

OLS in Matrix Form 1 The True Model Let X be an n k matrix where we have observations on k independent variables for n observations Since our model will usually contain a constant term, one of the columns

MATH022 Linear Algebra Brief lecture notes 48 Similarity and Diagonalization Similar Matrices Let A and B be n n matrices. We say that A is similar to B if there is an invertible n n matrix P such that

1 Hypothesis testing A statistical test is a method of making a decision about one hypothesis (the null hypothesis in comparison with another one (the alternative using a sample of observations of known

Methods for Finding Bases Bases for the subspaces of a matrix Row-reduction methods can be used to find bases. Let us now look at an example illustrating how to obtain bases for the row space, null space,

LECTURE 5 Solving Systems of Linear Equations Recall that we introduced the notion of matrices as a way of standardizing the expression of systems of linear equations In today s lecture I shall show how

Factor analysis Angela Montanari 1 Introduction Factor analysis is a statistical model that allows to explain the correlations between a large number of observed correlated variables through a small number

DEFINITION: A vector space is a nonempty set V of objects, called vectors, on which are defined two operations, called addition and multiplication by scalars (real numbers), subject to the following axioms

On Small Sample Properties of Permutation Tests: A Significance Test for Regression Models Hisashi Tanizaki Graduate School of Economics Kobe University (tanizaki@kobe-u.ac.p) ABSTRACT In this paper we

State Space Time Series Analysis p. 1 State Space Time Series Analysis Siem Jan Koopman http://staff.feweb.vu.nl/koopman Department of Econometrics VU University Amsterdam Tinbergen Institute 2011 State

Which null hypothesis do overidentification restrictions actually test? Rembert De Blander EcRu, U.C.Louvain and CES, K.U.Leuven Abstract In this note I investigate which alternatives are detected by over-identifying

55 CHAPTER NUMERICAL METHODS. POWER METHOD FOR APPROXIMATING EIGENVALUES In Chapter 7 we saw that the eigenvalues of an n n matrix A are obtained by solving its characteristic equation n c n n c n n...

The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression The SVD is the most generally applicable of the orthogonal-diagonal-orthogonal type matrix decompositions Every

Similar matrices and Jordan form We ve nearly covered the entire heart of linear algebra once we ve finished singular value decompositions we ll have seen all the most central topics. A T A is positive

Chapter 7 Factorization Theorems This chapter highlights a few of the many factorization theorems for matrices While some factorization results are relatively direct, others are iterative While some factorization

MATH 37 Linear Transformations from Rn to Rm Dr. Neal, WKU Let T : R n R m be a function which maps vectors from R n to R m. Then T is called a linear transformation if the following two properties are

.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop

LECTURE 5 Hypothesis Testing in the Classical Regression Model The Normal Distribution and the Sampling Distributions It is often appropriate to assume that the elements of the disturbance vector ε within

Spring 2006 This exam contains 7 questions. You should attempt them all. Each question is divided into parts to help lead you through the material. You should attempt to complete as much of each problem

Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the

These notes closely follow the presentation of the material given in David C. Lay s textbook Linear Algebra and its Applications (3rd edition). These notes are intended primarily for in-class presentation

Coefficients of determination Jean-Marie Dufour McGill University First version: March 1983 Revised: February 2002, July 2011 his version: July 2011 Compiled: November 21, 2011, 11:05 his work was supported

2DI36 Statistics 2DI36 Part II (Chapter 7 of MR) What Have we Done so Far? Last time we introduced the concept of a dataset and seen how we can represent it in various ways But, how did this dataset came

Determinants Dr. Doreen De Leon Math 52, Fall 205 Determinant of a Matrix Elementary Matrices We will first discuss matrices that can be used to produce an elementary row operation on a given matrix A.

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in

6.3 Orthogonal Matrices 1 Chapter 6. Orthogonality 6.3 Orthogonal Matrices Definition 6.4. An n n matrix A is orthogonal if A T A = I. Note. We will see that the columns of an orthogonal matrix must be

Mathematics Course 111: Algebra I Part IV: Vector Spaces D. R. Wilkins Academic Year 1996-7 9 Vector Spaces A vector space over some field K is an algebraic structure consisting of a set V on which are

Linear Codes Linear Codes In the V[n,q] setting, an important class of codes are the linear codes, these codes are the ones whose code words form a sub-vector space of V[n,q]. If the subspace of V[n,q]

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) Abstract Indirect inference is a simulation-based method for estimating the parameters of economic models. Its

Topic 1: Matrices and Systems of Linear Equations Let us start with a review of some linear algebra concepts we have already learned, such as matrices, determinants, etc Also, we shall review the method

7 - Linear Transformations Mathematics has as its objects of study sets with various structures. These sets include sets of numbers (such as the integers, rationals, reals, and complexes) whose structure

MATHEMATICAL METHODS OF STATISTICS By HARALD CRAMER TROFESSOK IN THE UNIVERSITY OF STOCKHOLM Princeton PRINCETON UNIVERSITY PRESS 1946 TABLE OF CONTENTS. First Part. MATHEMATICAL INTRODUCTION. CHAPTERS

Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression Saikat Maitra and Jun Yan Abstract: Dimension reduction is one of the major tasks for multivariate

3 The Scalar Algebra of Means, Covariances, and Correlations In this chapter, we review the definitions of some key statistical concepts: means, covariances, and correlations. We show how the means, variances,

Cross product 1 Chapter 7 Cross product We are getting ready to study integration in several variables. Until now we have been doing only differential calculus. One outcome of this study will be our ability

160 CHAPTER 4. VECTOR SPACES 4. Rank and Nullity In this section, we look at relationships between the row space, column space, null space of a matrix and its transpose. We will derive fundamental results

NOTES ON LINEAR TRANSFORMATIONS Definition 1. Let V and W be vector spaces. A function T : V W is a linear transformation from V to W if the following two properties hold. i T v + v = T v + T v for all