CUTOFF: A spatio-temporal imputation method

Abstract

Missing values occur frequently in many different statistical applications and need to be dealt with carefully, especially when the data are collected spatio-temporally. We propose a method called CUTOFF imputation that utilizes the spatio-temporal nature of the data to accurately and efficiently impute missing values. The main feature of this method is that the estimate of a missing value is produced by incorporating similar observed temporal information from the value's nearest spatial neighbors. Extensions to this method are also developed to expand the method's ability to accommodate other data generating processes. We develop a cross-validation procedure that optimally chooses parameters for CUTOFF, which can be used by other imputation methods as well. We analyze some rainfall data from 78 gauging stations in the Murray-Darling Basin in Australia using the CUTOFF imputation method and compare its performance to four well-studied competing imputation methods, namely, k-nearest neighbors, singular value decomposition, multiple imputation and random forest. Empirical results show that our method captures the temporal patterns well and is effective at imputing large gaps in the data. Compared to the competing methods, CUTOFF is more accurate and much faster. We analyze further examples to demonstrate CUTOFF's applications to two different data sets and provide extra evidence of its validity and usefulness. We implement a simulation study based on the Murray-Darling Basin data to evaluate the method; the results show that our method performs well in both accuracy and computational efficiency.

abstract = "Missing values occur frequently in many different statistical applications and need to be dealt with carefully, especially when the data are collected spatio-temporally. We propose a method called CUTOFF imputation that utilizes the spatio-temporal nature of the data to accurately and efficiently impute missing values. The main feature of this method is that the estimate of a missing value is produced by incorporating similar observed temporal information from the value's nearest spatial neighbors. Extensions to this method are also developed to expand the method's ability to accommodate other data generating processes. We develop a cross-validation procedure that optimally chooses parameters for CUTOFF, which can be used by other imputation methods as well. We analyze some rainfall data from 78 gauging stations in the Murray-Darling Basin in Australia using the CUTOFF imputation method and compare its performance to four well-studied competing imputation methods, namely, k-nearest neighbors, singular value decomposition, multiple imputation and random forest. Empirical results show that our method captures the temporal patterns well and is effective at imputing large gaps in the data. Compared to the competing methods, CUTOFF is more accurate and much faster. We analyze further examples to demonstrate CUTOFF's applications to two different data sets and provide extra evidence of its validity and usefulness. We implement a simulation study based on the Murray-Darling Basin data to evaluate the method; the results show that our method performs well in both accuracy and computational efficiency.",

N2 - Missing values occur frequently in many different statistical applications and need to be dealt with carefully, especially when the data are collected spatio-temporally. We propose a method called CUTOFF imputation that utilizes the spatio-temporal nature of the data to accurately and efficiently impute missing values. The main feature of this method is that the estimate of a missing value is produced by incorporating similar observed temporal information from the value's nearest spatial neighbors. Extensions to this method are also developed to expand the method's ability to accommodate other data generating processes. We develop a cross-validation procedure that optimally chooses parameters for CUTOFF, which can be used by other imputation methods as well. We analyze some rainfall data from 78 gauging stations in the Murray-Darling Basin in Australia using the CUTOFF imputation method and compare its performance to four well-studied competing imputation methods, namely, k-nearest neighbors, singular value decomposition, multiple imputation and random forest. Empirical results show that our method captures the temporal patterns well and is effective at imputing large gaps in the data. Compared to the competing methods, CUTOFF is more accurate and much faster. We analyze further examples to demonstrate CUTOFF's applications to two different data sets and provide extra evidence of its validity and usefulness. We implement a simulation study based on the Murray-Darling Basin data to evaluate the method; the results show that our method performs well in both accuracy and computational efficiency.

AB - Missing values occur frequently in many different statistical applications and need to be dealt with carefully, especially when the data are collected spatio-temporally. We propose a method called CUTOFF imputation that utilizes the spatio-temporal nature of the data to accurately and efficiently impute missing values. The main feature of this method is that the estimate of a missing value is produced by incorporating similar observed temporal information from the value's nearest spatial neighbors. Extensions to this method are also developed to expand the method's ability to accommodate other data generating processes. We develop a cross-validation procedure that optimally chooses parameters for CUTOFF, which can be used by other imputation methods as well. We analyze some rainfall data from 78 gauging stations in the Murray-Darling Basin in Australia using the CUTOFF imputation method and compare its performance to four well-studied competing imputation methods, namely, k-nearest neighbors, singular value decomposition, multiple imputation and random forest. Empirical results show that our method captures the temporal patterns well and is effective at imputing large gaps in the data. Compared to the competing methods, CUTOFF is more accurate and much faster. We analyze further examples to demonstrate CUTOFF's applications to two different data sets and provide extra evidence of its validity and usefulness. We implement a simulation study based on the Murray-Darling Basin data to evaluate the method; the results show that our method performs well in both accuracy and computational efficiency.