Choose your preferred view mode

Please select whether you prefer to view the MDPI pages with a view tailored for mobile displays or to view the MDPI
pages in the normal scrollable desktop version. This selection will be stored into your cookies and used automatically
in next visits. You can also change the view style at any point from the main header when using the pages with your
mobile device.

Abstract

:
Crop yield forecasting plays a vital role in coping with the challenges of the impacts of climate change on agriculture. Improvements in the timeliness and accuracy of yield forecasting by incorporating near real-time remote sensing data and the use of sophisticated statistical methods can improve our capacity to respond effectively to these challenges. The objectives of this study were (i) to investigate the use of derived vegetation indices for the yield forecasting of spring wheat (Triticum aestivum L.) from the Moderate resolution Imaging Spectroradiometer (MODIS) at the ecodistrict scale across Western Canada with the Integrated Canadian Crop Yield Forecaster (ICCYF); and (ii) to compare the ICCYF-model based forecasts and their accuracy across two spatial scales-the ecodistrict and Census Agricultural Region (CAR), namely in CAR with previously reported ICCYF weak performance. Ecodistricts are areas with distinct climate, soil, landscape and ecological aspects, whereas CARs are census-based/statistically-delineated areas. Agroclimate variables combined respectively with MODIS-NDVI and MODIS-EVI indices were used as inputs for the in-season yield forecasting of spring wheat during the 2000–2010 period. Regression models were built based on a procedure of a leave-one-year-out. The results showed that both agroclimate + MODIS-NDVI and agroclimate + MODIS-EVI performed equally well predicting spring wheat yield at the ECD scale. The mean absolute error percentages (MAPE) of the models selected from both the two data sets ranged from 2% to 33% over the study period. The model efficiency index (MEI) varied between −1.1 and 0.99 and −1.8 and 0.99, respectively for the agroclimate + MODIS-NDVI and agroclimate + MODIS-EVI data sets. Moreover, significant improvement in forecasting skill (with decreasing MAPE of 40% and 5 times increasing MEI, on average) was obtained at the finer, ecodistrict spatial scale, compared to the coarser CAR scale. Forecast models need to consider the distribution of extreme values of predictor variables to improve the selection of remote sensing indices. Our findings indicate that statistical-based forecasting error could be significantly reduced by making use of MODIS-EVI and NDVI indices at different times in the crop growing season and within different sub-regions.

Keywords:

ecodistrict; yield forecasting; MODIS; ICCYF; spring wheat

1. Introduction

Satellite remote sensing (RS) data offer significant benefits to assessing agricultural yield and production throughout the cropping season due to their spatial coverage, their temporal and spectral resolutions, their availability to users in timely manner and their affordability (free of charge for some data) [1–6]. High temporal and spatial resolutions with sufficient lead time are often required for objective and near-real time crop production estimates over large areas. Although various vegetation indices (VIs) from several satellite sensors are being explored in crop condition monitoring, crop acreage estimates, crop yield forecasting and modeling at regional and national scales, recent studies are mostly focused on data from moderate spatial resolution satellite sensors, i.e., MODIS (Moderate Resolution Imaging Spectroradiometer) for agricultural production assessments (e.g., [7–15]). The high temporal and moderately low spatial resolution, as well as the free availability of MODIS data could partly explain such an interest [9]. The Normalized Difference Vegetation Index (NDVI), based on the contrast between the maximum absorption in the red portion and the maximum reflection in the near-infrared portion of the electromagnetic spectrum has been widely utilized for crop monitoring and agricultural statistics (e.g., [4,6,16]). Comparisons of the MODIS NDVI with the NOAA-AVHRR (National Oceanographic and Atmospheric Administration-Advanced Very High Resolution Radiometer) NDVI temporal profiles over numerous biome types have shown that the MODIS-based index performed with higher fidelity in terms of characterizing the seasonal phenology [17,18]. By including reflectance in the blue band of the electromagnetic spectrum (in addition to the red and near-infrared bands), the Enhanced Vegetation Index (EVI) helps minimizing soil background and atmosphere influences in reflectance data [4,19–21]. Both MODIS NDVI and EVI are commonly used in crop yield forecasting (e.g., [5,6,22–24]). MODIS-derived VIs are also ideal for crop monitoring over fragmented agricultural landscapes (i.e., size of the field close to the size of the pixel [25]).

Reliable and timely crop yield forecasting is crucial for policy strategies, trade and market opportunities. It is also important for achieving and sustaining global food security [26]. In Canada, grain crop production (i.e., wheat, barley, corn, canola, and soybean) plays a vital role in the economy (for example a total production of around 63.5 million metric tonnes in 2011 and $17 billion of farm cash receipts were recorded [27]). A recent modeling tool, the Integrated Canadian Crop Yield Forecaster, ICCYF, has been developed for generating in-season crop yield forecasts at regional scale [28]. The tool utilizes historical climate, near-real time RS data and crop field survey in generating probabilistic yield forecasts that are sequentially-updated within the growing season as data becomes available. The ICCYF combines robust statistical techniques for generating in-season yield forecasts well before the end of the growing season and for providing a probability distribution of the forecasted yields at a given spatial unit. Typically, the basic spatial unit considered in crop yield forecasting at regional scales relies on administrative statistical boundaries [7,13,28–30], even when introducing the notion of ecoregion (e.g., [31,32]). Crop yield forecasting at this unit relies on the availability of historical data at these scales. The current crop yield forecasts based on the ICCYF are performed at the Census Agricultural Regions (CARs), which are composed of groups of adjacent census divisions and are used by the Census of Agriculture for disseminating agricultural statistics [33]. However, given the climate variability and the future challenges of the impacts of climate change on agriculture, exploring crop yield forecasting at the ecodistrict (ECD) scale deserves special attention. An ECD is defined as a subdivision of one ecoregion characterized by relatively homogeneous biophysical and climatic conditions [34]. The differentiating characteristics depends on the regional landform, local surface form, permafrost distribution, soil development, textural group, vegetation cover/land use classes, range of annual precipitation, and mean temperature. The improvement in the timeliness and accuracy of crop yield forecasting through the incorporation of near real-time data (both agroclimate and remote sensing) and the use of sophisticated statistical methods should improve our capacity to respond effectively to the future challenges. Running such tools at spatial scales based on common environmental characteristics (i.e., ecological spatial division) is therefore of great interest both from an ecosystem point of view and for a better crop yield forecasting (i.e., aggregation/upscaling of predicted yields).

Furthermore, though the skillful forecasts or model accuracy of the ICCYF for forecasting spring wheat yields reached 90% across the CARs of the Canadian Prairie provinces [Alberta (AB), Saskatchewan (SK) and Manitoba (MB)] [28], the weak model performance in some regions is still challenging. Indeed, high forecast uncertainties could occur in regions with high historical yield variability and sparse distribution of climate stations. The forecasting tool may fail to capture the changes linked to the impact of persistent drought and flooding conditions, or extreme growing conditions and associated crop management practices [28,35]. Although the choice of the CAR scale is guided by the availability of historical data for the public, this coarse unit normally contains many different soil and climate zones and thus may have different yield response relationships. Exploring the crop yield modeling at the ECD level through the ICCYF will give more insights on how the forecast may be achieved for oriented user groups, as well as a better understanding of the forecast skill at finer scales. In addition, one future development activity of the ICCYF [36] aims to use data from moderate resolution satellites (i.e., MODIS) for assessing the crop vegetation status and to derive VIs.

Therefore, the objectives of this study were: (i) to investigate the use of other RS indices (MODIS NDVI and EVI) as alternative predictors for forecasting spring wheat yields within the forecast model framework (ICCYF) at the ECD scale; (ii) to compare the forecast model performance at the ECD and CAR scales (ecological spatial unit versus statistical spatial unit), especially in CARs with poorer ICCYF performance as pointed by Newlands et al. [28]; and (iii) to understand when to use one RS index over another in a given region and to understand the dynamics during the growing season. Although several studies have related RS data and/or agroclimate indices to spring wheat on the Canadian Prairies, no such studies have been conducted to relate those indices to crop yield at the ECD scale. Thus, our study takes advantage of historical yield data at the ECD scale across the agricultural landscape of western Canada. The findings should help in improving the forecast skill of the ICCYF, namely in regions with high forecast uncertainties and/or high historical yield variability.

2. Materials and Methods

2.1. Overview of the Integrated Canadian Crop Yield Forecaster (ICCYF)

The ICCYF is a modeling tool for crop yield forecasting and risk analysis based on the integration of geospatial and statistical data within a Geographic Information System, developed by Agriculture and Agri-Food Canada (AAFC). The main features [28,36] of the ICCYF are: (1) the integration of a physical based soil moisture model to generate climate based predictors and satellite derived information; (2) an automatic ranking and selection of best predictors using Robust Least Angle Regression Scheme (RLARS) and Leave-One-Out-Cross-Validation (LOOCV) scheme at run time, as well as a spatial correlation analysis among the neighboring spatial units; (3) a Bayesian method for sequential forecasting, i.e., estimation of the prior and posterior distributions of model predictors through a Markov Chain Monte Carlo (MCMC) scheme, and random forest-tree machine learning techniques to select the best predictors of unobserved variables at the time of forecast.

The crop yield modeling process using the ICCYF is depicted in Figure 1. To set up a yield model at the ECD level, the potential predictors (agroclimate and remote sensing vegetation indices) were put into a RLARS [37,38] to evaluate and rank the variables that contribute the majority of the variance in the predicted yield. A maximum number of predictors was set based on the sample size of the model building data (set at two in our analysis). The RLARS was applied to account for heteroscedasticity and outliers in the historical data (i.e., model training/calibration). The selected predictors were then subjected to a Robust Cross Validation (RCV) scheme [39] to further stabilized the model by eliminating any false predictors selected from contaminated data [39]. Then, the Bayesian statistical approach as described by Bornn and Zidek [40] for the spatial correlation analysis among the neighboring spatial units was applied. Historical data of both forecasting ECD and statistically selected neighboring ECDs were used to establish the prior distribution of the predictors. The posterior distribution of the predictors was obtained using the MCMC scheme [41]. For the in-season forecasting, the random forest-tree machine learning technique [42] was used to estimate the required unobserved variables. The estimated variables and the variables observed at near real time were finally used as input into the selected yield model to forecast the yield probability distribution for each ECD. The 10th percentile (worst 10%), the 50th percentile (median) and the 90th percentile (best 10%) were output as the probability measures [28,36]. A detailed description of the modeling methodology can be found in Newlands et al. [28] and Chipanshi et al. [36].

The final model at the spatial unit considered is a multivariate regression equation, written as follows:

Yt=γ0+γ1t+∑i=2nγiXi+ɛt

(1)

where Yi,t is the crop yield for year t, γ0 is the regression intercept, γ1t refers to the technology trend over time. Xi is the predictor variable, n is the total number of selected predictors, and εt is the error term (independent and normally distributed with mean zero and variance σ2). The technology trend in yield was assumed to be linear. Extreme values of input data are not taken into account.

2.2. Data

2.2.1. Agroclimate Data

The study region encompassed the three Canadian Prairie Provinces AB, SK and MB (Figure 2), which contains the majority of the agriculture land in Canada. These three provinces account for about 85% of Canada’s arable land [43].

The agroclimate variables (i.e., soil water availability, SWA, and crop water deficit index, WDI) were calculated using the Versatile Soil Moisture Budget (VSMB) model at a daily time step [44] and based on the daily temperatures and precipitation, soil physical parameters (obtained from the Canadian Soil Information Service [45]), and spring wheat information. Daily temperatures and precipitation from a total of 330 climate stations (Figure 2) were provided by Environment Canada and other partner institutions through the Drought Watch activity [46] operated at the National Agroclimate Information Service of Agriculture and Agri-Food Canada (NAIS-AAFC). Also included as agroclimate variable was the growing degree days (GDD) above a base temperature of 5 °C (calculated as the mean daily temperature above a certain threshold base temperature accumulated on a daily basis over a period of time). A detailed description of the calculation of SWA and WDI can be found in Newlands et al. [28]. These agroclimate variables were summed (i.e., GDD) or averaged (i.e., SWA and WDI) for each month from May to August during each growing season. They were then spatially averaged across all stations within a given ECD with equal weighting. For ECDs with no climate data, stations from neighboring ECDs were used.

2.2.2. MODIS-Derived NDVI and EVI Indices

The 16-day MODIS NDVI and EVI composites for Canada south of 60 degree (2000–2010 period) come from the NASA Land Processes Distributed Active Archive Center’s MOD13Q1 products [47]. These products are computed from atmospherically corrected bi-directional surface reflectances that have been masked for water, clouds, heavy aerosols, and cloud shadows [17]. A mask of the Canadian agricultural land [48] was used to process the composites at the ECD level. For our purpose the average value of pixels was kept as the VI value for a given ECD. For all years, the period of MODIS data coverage spanned from the day of year 129 (approximately 9 May) to day of year 273 (approximately 30 September). NDVI is calculated from reflectances in the red and near-infrared (NIR) portions of the spectrum. EVI incorporates reflectance in the blue portion of the spectrum in addition to the red and NIR. The equations are given as follows [19–21]:

NDVI=ρNIR−ρREDρNIR+ρRED

(2)

EVI=2.5ρNIR−ρREDρNIR+6ρRED−7.5ρBLUE+1

(3)

where ρNIR is the reflectance in the near-infrared portion (841–876 nm), ρRED is the reflectance in the red portion (620–670 nm), and ρBLUE refers to the reflectance in the blue portion (459–759 nm).

The distribution of MODIS-derived vegetation indices when polled across ECDs and years (Figure 3) showed that several values were considered as extreme (outliers) during the mid-season (16-day periods 22–30 corresponding to approximately June, July, and beginning of August) for NDVI values, and early in the growing season (i.e., May) and the mid-season (i.e., July) for the EVI values. This suggests the sensitivity of MODIS VIs during the beginning of the growing season (case of EVI) and during the mid-season (both the VIs).

2.2.3. Crop Yield Data

Historical spring wheat yields (all cultivars included) at the ECD scale were provided by Statistics Canada. This yield data for spring wheat was part of a larger historical crop yield federal government database spanning 1992–2010 for Canada’s 38 crops and agricultural area and was intended for AAFC research on crop production and environmental modeling. The estimated yield data were generated from the Field Crop Reporting Series (FCRS). FCRS is a series of six data collection activities using surveys and carried out in order to get accurate and timely estimates of seeding intentions, seeded and harvested area, production, yield and farm stocks of the principal field crops in Canada at the provincial level. Data collected are quality-controlled and compared to previous estimates and other sources when available in order to reduce errors associated to such sample surveys [49]. The official yield estimates of the year is derived from the the November Survey. Only yield data of the 2000–2010 period were included in our study in order to coincide with the MODIS data period (available from 2000 onwards).

2.3. Crop Yield Modeling and Assessment of Model Performance

The modeling approach is similar to that described by Newlands et al. [28]. The main differences rely on the spatial unit considered (i.e., ECD) and the RS VIs used. The ICCYF was adapted to fit the requirements of the statistical processes involved in the modeling for a 11-year period of input data (i.e., the number of predictors, the relative convergence tolerance for the fully iterated best model candidates, the number of iterations for the model convergence, etc.). Here we tested the EVI in addition to the NDVI. Agroclimate and MODIS-derived VIs were used as predictors of yield forecasting at each ECD level. Therefore, two input data sets were involved: (i) agroclimate plus NDVI indices; and (ii) agroclimate plus EVI indices. The average value of two consecutive 16-day periods of MODIS NDVI/EVI were used. For testing the robustness of the modeling at the ECD scale, the leave-one-year-out cross-validation as performed in Mkhabela et al. [10] and Chipanshi et al. [36]was achieved in this study. For each year of the study period, historical data excluding this year (considered as the forecast year) were used as training data for the yield model at each spatial unit. The forecast was then performed using the data of the forecast year during the growing season (i.e., July, August, and September). The forecasted yield was generated using the observed data from the start of the growing season until the last day of the previous month and the inputs estimated by the tool for the remainder of the growing season.

In order to compare the forecast skills within CARs with poorest ICCYF performance (i.e., CAR 4607, 4609 and 4610, Figure 2), additional runs were performed at the CAR scale over the same period using the input data set as that of [28]. This input data set included historical spring wheat yields, agroclimate variables (i.e., GDD, SWA, WDI), and NOAA-AVHRR NDVI at CAR scale across the Canadian Prairies. CARs with poorest ICCYF performance were previously determined in Newlands et al. [28]. Although RS VIs are already achieved within the ICCYF tool, the use of MODIS-derived VIs aims to investigate their potential for the future development activities. Note that only ECDs with crop land were used in our analysis (207 in this case).

Statistical indicators (i.e., root mean square error-RMSE, mean absolute percentage error-MAPE, and model efficiency index-MEI) were used to quantify the performance of the regression-based models at the ECD scale. The RMSE gives the weighted variations in errors (residual) between the predicted and observed yields. It was calculated as:

RMSE=1n∑i=1n(Pi−Oi)2

(4)

where n is the number of observations, Pi is the predicted yield and Oi is the observed yield.

The MAPE is an accuracy measure of the forecast quality and was calculated as:

MAPE=100⋅1n∑i=1n|Oi−PiOi|

(5)

The MEI could be considered as a measure of model skill [50]. The closer to 1 the MEI is, the more skillful the model. MEI was calculated as:

MEI=1−[∑i=1n(Oi−Pi)2/∑i=1n(Oi−O¯i)2]

(6)

where Ōi is the mean observed yield.

The ICCYF runs and statistical analyses were performed using R [51]. The geospatial analyses and mapping were performed using ArcGis 10.1 [52]. The mapping of the model performance throughout the paper was based on that crop land extent map.

3. Results and Discussion

3.1. Selected Predictors at the Ecodistrict Scale

The number of predictors selected by ECD differed from one ECD to another across the study region and according to the input data set used. MODIS-derived indices over the period weeks of the year 24 to 28 (NDVI(EVI)_26_28 and NDVI(EVI)_24_26) were among the predominant predictors (Table 1) during the 11-year period. This recurrence of July-related MODIS indices suggests that the crop status in mid-season is determining in the final spring wheat yield across the Western Canadian Prairies. These results are in agreement with previous studies which have reported that the optimal time for obtaining NDVI information to be related with final yield of spring-seeded crops including spring wheat for the Canadian Prairies was July [53]. Other predominant predictors included agroclimate indices GDD_6 (total GDD of June) and P_6 and P_7 (sums of precipitation in June and July, respectively). In addition, the number of precipitation and temperature related predictors, namely in June and July, reveals that these meteorological factors are important for the final yield of spring wheat.

The comparisons of model performance indicators showed that the results were quite similar when using a data set including either agroclimate and NDVI indices or agroclimate and EVI indices (Figure 4). Overall for models based on agroclimate plus MODIS-NDVI indices, RMSE values ranged from 43 to 957 kg·ha−1 over the 2000–2010 period. The ranges of the MEI and MAPE were −1.1 to 0.99, and 2% to 33%, respectively, for the same period. Satisfactory results were obtained with the coefficient of determination of yield models (median values equaled 0.63). The same ranges of performance indicators were obtained in case of models based on agroclimate indices plus MODIS-EVI indices, with differences occurring in the low values of MEI (i.e., MEI = −1.8) and high values of RMSE (i.e., RMSE = 975 kg·ha−1). High ranges of performance indicators suggest that the models did not capture well the extreme input values. Modeling crop yields by taking into account these extreme values could help improving the model forecast skill. Furthermore, it is worth to note that the MODIS VI is representative of all crops in the area and will be highly influenced by the dominant crops. The consideration of crop specific mask to derive VIs will probably result in improved crop yield forecast models.

The overall spatial pattern of model performance for each forecast year (2000–2010) was similar to that presented with the 2010 yield forecast (Figure 5). The spatial distribution of the ICCYF performance showed that the model performance did vary across the agricultural regions of the Canadian Prairies. ECDs with relatively high MAPE (≥15%, Figure 5) were located in central and central Alberta, central Saskatchewan, and western Manitoba, with the best model performance being across Saskatchewan (Figure 5). These agricultural regions are generally located in the semi-arid and arid climatic regions. Mkhabela et al. [10] found best prediction models for spring wheat in the semi-arid and arid zone when relating the CAR-scale crop yield to MODIS-NDVI data. This result has implications in climate change studies. The province of Saskatchewan encompass the three major climatic regions (i.e., sub-humid, semi-arid, and arid) of the Canadian Prairies [10]. As a continuing effort to integrate future climate scenarios data within the ICCYF, and hence investigate the climate variability impacts on agriculture across Canadian agricultural landscape, future analyses could be focused on this province.

MODIS NDVI and EVI performed equally well predicting spring wheat yield at the ECD scale. Bolton and Friedl [32] found that for predicting maize yields MODIS-EVI (coupled to other crop phenology metrics) provided relatively better results than MODIS-NDVI (R2 = 0.58 and 0.53 for EVI-based and NDVI-based regression models, respectively); but they had the same performance in predicting soybean yield. No crop specific mask was used in our study for retrieving the MODIS VIs. Using crop specific masks is part of the ongoing research and should provide additional information on crop phenology, useful for deriving potential predictors (i.e., phenology-based metrics) and for improving the model forecast skill. Further research also includes the use of additional RS indices such as the fraction of absorbed photosynthetically active radiation (FAPAR). The ICCYF runs at ECD scale could also be investigated with different crops such as oilseeds, corn and barley.

3.3. In-Season Crop Yield Forecasting Based on the Two Input Data Sets

The yield forecasts of spring wheat were performed for the months of July, August and September for the 11-year period of the study. Generally, high correlations were obtained for each month of forecast for both input data set used for the modeling: R values greater or equal 0.60, except in 2002 (Tables 2 and 3). Regarding the models based on agroclimate plus MODIS-NDVI indices (Table 2), RMSE values ranged from 388 to 940 kg·ha−1, 402 to 911 kg·ha−1, and 403 to 958 kg·ha−1 for July, August, and September forecasts. In case of models based on on agroclimate plus MODIS-NDVI indices (Table 3), the ranges of RMSE values were 361 to 925 kg·ha−1, 390 to 906 kg·ha−1, and 395 to 952 kg·ha−1 for July, August, and September forecasts. Relatively good ICCYF performance was obtained in July, compared with August and September forecasts. This trend was also observed when performing the forecast of spring wheat yield at CAR scale [36]. Ordinarily the forecasted yield gets closer to the actual yield towards the end of the growing season (as more data become available) for a given spatial unit of modeling. However, through the current ICCYF selection algorithm, a yield model may only contain predictors at a few critical stages (e.g., predictors based on observations before July). Thus, predictors based on observations of following months, when becoming available, will have no effect on the forecasted yield. The forecast error and reliability range remain the same for those spatial units. For instance, a great number of models based on July-related predictors and having worst performance will lessen the overall performance of the tool across the study agricultural region. Future research will include assessing how the lead time (time window for which potential predictors, namely RS data, are available before a forecast needs to be/is made) affects the forecast accuracy.

The comparison of July, August and September forecasts for all ecodistricts on the study region showed noticeable biases (expressed through the mean bias error, MBE) between the observed and predicted yields in 2000 (yield underestimation) and 2002 (yield overestimation) for models based on both two data sets (Tables 2 and 3). In 2002, many North American regions, including the Canadian Prairies, experienced severe drought conditions [35]. Based on our historical input data set, the ICCYF tool failed to capture such extreme conditions. In its current version, extreme value distribution is not taken into account in the forecast models, though the shortness of the study period did not enable more extreme conditions neither. Future works related to the fine tuning of statistical algorithms involved in the ICCYF should help integrating the impacts of extreme conditions.

The ICCYF runs at CAR scale based on the 11-year data period showed the general trend as found in Newlands et al. [28]: CARs with poorest model performance were located in south-eastern Manitoba (i.e., CARs 4607, 4609 and 4610). A general trend of the spatial distribution of model performance at CAR scale is depicted in Figure 5C. The coefficient of variation (CV), calculated as the ratio of the RMSE to the mean of the dependent variable (i.e., yield), in these CARs was at least greater than 15% over the entire study period. Comparing the ICCYF performance at a finer scale (here the ECD scale) with the CAR scale was very informative. The variability of the model performance within a given CAR was highlighted at the ecological scale. For example significant range of CVs occurred within some CARs (e.g., CARs in the province of Alberta, Figure 5A and B). However, there were some CARs, i.e., south-eastern of Saskatchewan, where the trend at ECD scale was the same at CAR scale (CV ≤ 10%, Figure 5).

In this study we emphasized the yield forecasting of spring wheat at the ecodistrict scale through an integrated crop yield forecasting approach (i.e., the ICCYF) because of the cohesiveness of the modelling unit in terms of climate and soils. The results related to the three CARs with weak model performance (as mentioned previously) showed that this performance was improved in most cases at the ECD scale: higher average MEI values and low average MAPE values (Figure 6). Overall, in CARs with poorer ICCYF performance as reported in previous study, the model performance was improved at that finer and ecological scale, compared to larger statistical unit scale (MAPEs reduced by 40% and MEI increased by five, on average). Downscaling the yield forecasting approach at such specific ecological scale helps in understanding the yield variability within a given statistical unit and improve the forecast skills. Among the three CARs involved in the study, only the CAR 4607 presented ECDs with model performance that were slightly improved. Indeed, ECD 841, 844, 846, 849, and 851 had weak or similar average MAPE (depending on the input data set considered) than that obtained at CAR. Whereas only ECD 851 had a model skill (MEI) less than the CAR-scale one. Although all the CARs with worst ICCYF selected in our study were located in a sub-humid climatic regions, ECDs with relatively low MAPE values belong mainly to the Lake Manitoba plain ecoregion [34]. This ecoregion is characterized as the warmest and most humid in the Canadian Prairies with annual mean precipitation ranging from 450 to 700 mm [54]. A negative relationship has been reported between agricultural productivity and MODISNDVI for a maximum threshold of 600 mm in Western Australia [55]. The worst ICCYF performance in the above mentioned ECDS could support this negative relationship between spring wheat yield and MODIS-derived VIs (though the predictors also included agroclimate variables in our case). The weak performance of the ICCYF in CARs across south-eastern Manitoba could be attributed to the soil surface water capacity. Taking into account RS indices that capture the spatio-temporal patterns in surface moisture status (e.g., Normalized Difference Water Index, NDWI [56], or CERES-Photosynthetically Absorbed Radiation [57]) will be explored in future works. Furthermore, exploring the forecast skill of existing approach at ecological scale gives an opportunity for more insights on the impacts of climate variability on crop yield and production.

4. Conclusions

Remote sensing vegetation indices coupled to agroclimate variables are increasingly being used for crop monitoring and crop yield forecasting at regional scales. This study aimed at assessing spring wheat yield forecasting at the ecodistrict scale through an integrated crop yield forecasting approach (i.e., the ICCYF) because of the cohesiveness of the modelling unit in terms of climate and soils. The MODIS NDVI and EVI were respectively coupled to the same agroclimate indices and the performance of the models selected on the basis of these two input data sets was compared across three Western Canadian provinces over a 11-year period (2000–2010). Overall, MODIS NDVI and EVI performed equally well predicting spring wheat yield at the ECD scale. Our analysis also showed that in coarser statistical units (i.e., CAR) with poorer ICCYF performance as reported in previous study, the model performance was improved at that finer and ecological scale. Our findings indicate that statistical-based forecasting error could be significantly reduced by making use of MODIS-EVI and NDVI indices at different times in the crop growing season and within different sub-regions. Downscaling the yield forecasting approach at such specific ecological scale helps in understanding the yield variability within a given statistical unit and improve the forecast skills. Furthermore, exploring the forecast skill of existing approach at ecological scale gives an opportunity for more insights on the impacts of climate variability on crop yield and production.

Acknowledgments

This research was funded by the research project on “Integrated crop yield forecasting across Canada’s agricultural landscape”, co-led by A. Chipanshi and N.K. Newlands. The first author (Louis Kouadio) was funded by a NSERC (Natural Sciences and Engineering Research Council of Canada) Visiting Post-doctoral Fellowship. We thank Zhong Yao for processing the climate data. The authors would also like to thank the many individuals involved in the Integrated Canadian Crop Yield Forecasting Across the Agriculture Landscape of Canada project. The authors acknowledge and thank Statistics Canada for providing the spring wheat yield data at the ecodistrict scale used in our analysis. This yield data for spring wheat is part of a larger historical crop yield federal government database spanning 1992–2010 for Canada’s 38 crops and agricultural area, and was provided/shared to Agriculture and Agri-Food Canada (AAFC). We thank the anonymous reviewers for their constructive comments.

Author Contributions

Louis Kouadio and Nathaniel K. Newlands are the principal authors of this manuscript having written the majority of the manuscript and performed the analyses. The remote sensing data at the ecodistrict scale across the agricultural landscape of Canada were processed by Andrew Davidson. Aston Chipanshi, Yinsuo Zhang, and Andrew Davidson have reviewed this manuscript and provided comments and suggestions.

Figure 2.
Ecodistricts and Census Agricultural Regions (CARs) across the Western Canadian Prairie (encompassing the provinces of Alberta (AB), Saskatchewan (SK) and Manitoba (MB)). The highlighted CARs (i.e., 4607, 4609, 4610) are those with poorer ICCYF performance for spring wheat [28].

Figure 2.
Ecodistricts and Census Agricultural Regions (CARs) across the Western Canadian Prairie (encompassing the provinces of Alberta (AB), Saskatchewan (SK) and Manitoba (MB)). The highlighted CARs (i.e., 4607, 4609, 4610) are those with poorer ICCYF performance for spring wheat [28].

Figure 3.
Distribution and variability of MODIS NDVI (A) and EVI (B) during the cropping season (May–August) based on historical data, 2000–2010 period. The ends of the boxplots indicate the upper and lower quantiles, the solid line indicates the median. The whiskers are 1.5 times of the box height towards upper and lower from the median. Asterisks are the outliers.

Figure 3.
Distribution and variability of MODIS NDVI (A) and EVI (B) during the cropping season (May–August) based on historical data, 2000–2010 period. The ends of the boxplots indicate the upper and lower quantiles, the solid line indicates the median. The whiskers are 1.5 times of the box height towards upper and lower from the median. Asterisks are the outliers.

Figure 4.
Ranges of the root mean square error (RMSE, bottom), model efficiency index (middle), and mean absolute percentage error (MAPE, bottom) of spring wheat yield models at the ecodistrict scale, 2000–2010 period. (A) the input data set includes agroclimate (i.e., growing degree days (GDD), soil water availability (SWA), precipitation (P), and crop water deficit index (WDI)) and MODIS-NDVI indices; (B) same agroclimate indices as previously and MODIS-EVI indices. Note: year was included as an additional input variable in all cases. The ends of the boxplots indicate the upper and lower quantiles, the solid line indicates the median. The whiskers indicate the minimum and maximum values.

Figure 4.
Ranges of the root mean square error (RMSE, bottom), model efficiency index (middle), and mean absolute percentage error (MAPE, bottom) of spring wheat yield models at the ecodistrict scale, 2000–2010 period. (A) the input data set includes agroclimate (i.e., growing degree days (GDD), soil water availability (SWA), precipitation (P), and crop water deficit index (WDI)) and MODIS-NDVI indices; (B) same agroclimate indices as previously and MODIS-EVI indices. Note: year was included as an additional input variable in all cases. The ends of the boxplots indicate the upper and lower quantiles, the solid line indicates the median. The whiskers indicate the minimum and maximum values.

Figure 5.
2010 yield forecast of spring wheat- Spatial distribution of model error (CV %) for all ecodistricts and Census Agricultural Regions in yield forecasting using agroclimate and remote sensing indices. (A) agroclimate indices (GDD, P, SWA, WDI) and MODIS-NDVI; (B) same agroclimate indices as previously plus MODIS-EVI; (C) WDI, GDD, SWA and AVHRR-NDVI. MODIS NDVI/EVI values are the average of two consecutive 16-day periods, while AVHRR-NDVI indices are 3-week moving averages. n.a.: not applicable. Note: year was included as an additional input variable in all cases. The mapping of the model performance was based on the crop land extent map.

Figure 5.
2010 yield forecast of spring wheat- Spatial distribution of model error (CV %) for all ecodistricts and Census Agricultural Regions in yield forecasting using agroclimate and remote sensing indices. (A) agroclimate indices (GDD, P, SWA, WDI) and MODIS-NDVI; (B) same agroclimate indices as previously plus MODIS-EVI; (C) WDI, GDD, SWA and AVHRR-NDVI. MODIS NDVI/EVI values are the average of two consecutive 16-day periods, while AVHRR-NDVI indices are 3-week moving averages. n.a.: not applicable. Note: year was included as an additional input variable in all cases. The mapping of the model performance was based on the crop land extent map.

Figure 6.
Average values of model efficiency index (A) and mean absolute error percentage (B) of yield models in selected Census Agricultural Regions (CARs) and their corresponding ecodistricts (ECD) during the 2000–2010 period. CARUID and ECDUID stand for CAR and ECD unit identifiers, respectively. Striped bars represent model performance measures at CAR scale. The selected CARs (i.e., 4607, 4609 and 4610; see Figure 2) are those with weak ICCYF performance at CAR scale [28]. The runs at the ECD scale are based on agroclimate (AgMet; i.e., GDD, P, SWA, WDI) and MODIS-NDVI/EVI indices. Whereas those at the CAR scale are based on agroclimate (GDD, SWA, WDI) and AVHRR-NDVI indices.

Figure 6.
Average values of model efficiency index (A) and mean absolute error percentage (B) of yield models in selected Census Agricultural Regions (CARs) and their corresponding ecodistricts (ECD) during the 2000–2010 period. CARUID and ECDUID stand for CAR and ECD unit identifiers, respectively. Striped bars represent model performance measures at CAR scale. The selected CARs (i.e., 4607, 4609 and 4610; see Figure 2) are those with weak ICCYF performance at CAR scale [28]. The runs at the ECD scale are based on agroclimate (AgMet; i.e., GDD, P, SWA, WDI) and MODIS-NDVI/EVI indices. Whereas those at the CAR scale are based on agroclimate (GDD, SWA, WDI) and AVHRR-NDVI indices.

Agroclimate variables include growing degree days (GDD), soil water availability (SWA), precipitation (P), and crop water deficit index (WDI). Numbers following these indices, i.e., 5, 6, 7, and 8, stand for May, June, July and August, respectively; The average of two consecutive 16-day periods of MODIS NDVI/EVI were considered. The number after NDVI/EVI refer to the periods used for the calculation within the year.

Table 2.
Comparison between observed and predicted yields of spring wheat at the ecodistrict scale in Western Canada during the 2000–2010 period. Yield models are obtained using the ICCYF tool (input data set including agroclimate and MODIS-NDVI variables). The forecasts are made for July, August and September. The results are pooled across all ecodistricts.

Table 2.
Comparison between observed and predicted yields of spring wheat at the ecodistrict scale in Western Canada during the 2000–2010 period. Yield models are obtained using the ICCYF tool (input data set including agroclimate and MODIS-NDVI variables). The forecasts are made for July, August and September. The results are pooled across all ecodistricts.

aCoefficient of correlation;bMean bias error (kg·ha−1); MBE is the average difference between the predicted and observed yields;cRoot mean square error (kg·ha−1).

Table 3.
Comparison between observed and predicted yields of spring wheat at the ecodistrict scale in Western Canada during the 2000–2010 period. Yield models are obtained using the ICCYF tool (input data set including agroclimate and MODIS-EVI variables). The forecasts are made for July, August and September. The results are pooled across all ecodistricts.

Table 3.
Comparison between observed and predicted yields of spring wheat at the ecodistrict scale in Western Canada during the 2000–2010 period. Yield models are obtained using the ICCYF tool (input data set including agroclimate and MODIS-EVI variables). The forecasts are made for July, August and September. The results are pooled across all ecodistricts.