Choose your preferred view mode

Please select whether you prefer to view the MDPI pages with a view tailored for mobile displays or to view the MDPI
pages in the normal scrollable desktop version. This selection will be stored into your cookies and used automatically
in next visits. You can also change the view style at any point from the main header when using the pages with your
mobile device.

Abstract

:

Moderate spatial resolution satellite data from the Landsat-8 OLI and Sentinel-2A MSI sensors together offer 10 m to 30 m multi-spectral reflective wavelength global coverage, providing the opportunity for improved combined sensor mapping and monitoring of the Earth’s surface. However, the standard geolocated Landsat-8 OLI L1T and Sentinel-2A MSI L1C data products are currently found to be misaligned. An approach for automated registration of Landsat-8 OLI L1T and Sentinel-2A MSI L1C data is presented and demonstrated using contemporaneous sensor data. The approach is computationally efficient because it implements feature point detection across four image pyramid levels to identify a sparse set of tie-points. Area-based least squares matching around the feature points with mismatch detection across the image pyramid levels is undertaken to provide reliable tie-points. The approach was assessed by examination of extracted tie-point spatial distributions and tie-point mapping transformations (translation, affine and second order polynomial), dense-matching prediction-error assessment, and by visual registration assessment. Two test sites over Cape Town and Limpopo province in South Africa that contained cloud and shadows were selected. A Landsat-8 L1T image and two Sentinel-2A L1C images sensed 16 and 26 days later were registered (Cape Town) to examine the robustness of the algorithm to surface, atmosphere and cloud changes, in addition to the registration of a Landsat-8 L1T and Sentinel-2A L1C image pair sensed 4 days apart (Limpopo province). The automatically extracted tie-points revealed sensor misregistration greater than one 30 m Landsat-8 pixel dimension for the two Cape Town image pairs, and greater than one 10 m Sentinel-2A pixel dimension for the Limpopo image pair. Transformation fitting assessments showed that the misregistration can be effectively characterized by an affine transformation. Hundreds of automatically located tie-points were extracted and had affine-transformation root-mean-square error fits of approximately 0.3 pixels at 10 m resolution and dense-matching prediction errors of similar magnitude. These results and visual assessment of the affine transformed data indicate that the methodology provides sub-pixel registration performance required for meaningful Landsat-8 OLI and Sentinel-2A MSI data comparison and combined data applications.

1. Introduction

Moderate spatial resolution satellite data from the similar polar-orbiting sun-synchronous Landsat-8 and Sentinel-2 sensors together provide the opportunity for improved mapping and monitoring of the Earth’s surface [1]. Landsat-8 carries the Operational Land Imager (OLI) and the Thermal Infrared Sensor (TIRS) that sense 11 spectral bands including eight 30 m reflective wavelength bands, one 15 m panchromatic band, and two 100 m thermal wavelength bands [2]. The Landsat-8 swath is approximately 185 km (15° field of view from an altitude of 705 km) and provides a global coverage of the Earth’s surface every 16 days [2]. Sentinel-2A carries the Multi Spectral Instrument (MSI) that has 13 spectral bands ranging from 0.433 μm to 2.19 μm, including four 10 m visible and near-infrared bands, six 20 m red edge, near-infrared and short wave infrared bands, and three 60 m bands [3]. The Sentinel-2A swath is approximately 290 km (20.6° field of view from an altitude of 786 km) and provides a global coverage every 10 days and with the planned launch of a follow on identical Sentinel-2B sensor will provide 5-day global coverage [3]. Combined, the Sentinel-2 and Landsat-8 sensors will provide 10 m to 30 m multi-spectral reflective wavelength global coverage approximately every 3 days.

This paper describes the automated registration of Landsat-8 and Sentinel-2A reflectance data into the same common coordinate system. The geometrically corrected sensor data are available for Sentinel-2A as L1C top-of-atmosphere (TOA) tiles [4,5] and for Landsat-8 as L1T TOA images defined in a Worldwide Reference System (WRS) path/row coordinate system [6,7]. The geolocation performance specification for Sentinel-2A is 12.5 m (3σ) [8] and for Landsat-8 is 12 m (90% circular error) [9]. However, the Sentinel-2A L1C and Landsat-8 L1T data are currently misaligned relative to each other by more than several 10 m pixels [10]. This is because although both sensor geolocation systems use parametric approaches, whereby information concerning the sensing geometry is modeled and the sensor exterior orientation parameters (attitude and position) are measured, they use different ground control and digital elevation models to refine the geolocation [8,9]. The Landsat geolocation uses a global sample of ground control points derived from the Global Land Survey (GLS) cloud-free single-date Landsat images that are defined for each WRS path/row for different decades [11,12]. The Sentinel-2A geolocation will be improved by using a global reference image derived from an orthorectified set of Sentinel-2A cloud-free images [13,14]. Unfortunately, a relative misalignment of the Landsat-8 L1T and Sentinel-2A L1C data has been observed that varies among Landsat WRS path/row locations due primarily to variable GLS path/row locational accuracies [9]. Consequently, the GLS data are being reprocessed to provide a better match with the ground control used for operational geolocation of the Sentinel-2A data [10]. A more detailed description of the causes of the sensor misregistration is provided in [15].

A large body of research has been published concerning the registration of satellite images [16]. Methods are divided broadly into area-based matching methods, whereby a small region of one image is moved systematically across the other image and the location that provides the highest reflectance correlation provides a tie-point, and feature-based methods where tie-points are found by locating the positions of high-contrast features common to both images [17]. Given a sufficient number of tie-points, a transformation function, usually expressed as two polynomial functions that map the x and the y pixel locations of one image to the other image, is derived, often by least-squares regression analysis. For highly distorted data such as airborne imagery, local transformation functions are needed [18,19,20]. Different satellite data registration methods have been refined and proposed, for example, with respect to the initial knowledge of the relative orientation and scale of the images, to include computational efficiencies required to improve the matching speed, and to robustly handle cloud occlusion, land surface changes and differences between sensors [16].

In this study, a hierarchical image matching approach, which was originally developed for registration of High-Resolution Imaging Science Experiment (HiRISE) single-band stereo images to derive digital elevation models of Mars [21], was refined for registration of Landsat-8 L1T and Sentinel-2A L1C data. This approach was adopted because it has been proven for operational automated processing. Moreover, it uses an efficient feature- and area-based matching approach, is robust to noise, generates a large number of dense matches in a computationally efficient manner, and works well when the relative orientation and location of the images to be registered are known approximately. In this study, the approach was refined and applied to the 10 m Sentinel-2A near-infrared (NIR) and the 30 m Landsat-8 NIR bands. The NIR bands were selected because NIR reflectance has a greater range over soil, vegetation, and water, than visible wavelengths [22], and so usually the NIR provides high spatial contrast suitable for area- and feature-based matching of images acquired with similar dates (as in this study). In addition, the NIR is less sensitive to atmospheric contamination than at visible wavelengths [23]. We note that the NIR is commonly used for registration of moderate and high spatial resolution satellite data [17,24,25]. The Landsat-8 15 m panchromatic band, which covers predominantly the green and red wavelengths (503 to 676 nm) [26], was not used in this study because there is no spectrally similar Sentinel-2A band.

The paper is organized as follows. First, the Sentinel-2A L1C and Landsat-8 L1T geometric data characteristics and the common projection and tiling scheme used to reproject the data are described and illustrated for the test data that are selected in South Africa where approximately contemporaneous Sentinel-2A and Landsat-8 were available. The registration methodology and assessment approach are then described. Quantitative and qualitative results are presented to illustrate the registration performance and examples of the misregistration between Landsat-8 and Sentinel-2A data. The paper concludes with a discussion on the results and implications for combined Landsat-8 and Sentinel-2A data applications.

2. Overview of Landsat-8 L1T and Sentinel-2A L1C Geometric Data Structure, the Common Map Projection, and the Study Test Data

2.1. Landsat-8 L1T and Sentinel-2A L1C Geometric Data Structure

The Sentinel-2A geometrically corrected L1C products are defined in approximately 110 × 110 km tiles referenced to the U.S. Military Grid Reference System [4,5]. The Landsat-8 geometrically corrected L1T products are defined by approximately 180 × 180 km images referenced by WRS path/row [6,7]. Both sensor data are geolocated in the Universal Transverse Mercator (UTM) map projection in the World Geodetic System 84 (WGS84) datum and with specified UTM zones. Each UTM zone covers 6° of longitude, and so forms the basis of a separate map projection [27]. The Sentinel-2A swath is approximately 290 km and encompasses more than one UTM zone. The Sentinel-2A L1C data are defined in tiles that spatially overlap, and adjacent L1C tiles may be defined in different UTM zones [4,5]. The Sentinel-2A L1C tiled data structure, although complex, ensures that the geographic coordinates of each L1C pixel are fixed. The Landsat-8 185 km swath is sufficiently narrow for each L1T image to be defined in a single UTM zone. However, the geographic coordinates of each Landsat L1T pixel are not fixed.

2.2. Map Projection Used for Registration

Over large areas, users of derived satellite information require spatially-explicit map products defined in a single projection rather than different UTM projections. Therefore, in this study, the Sentinel-2A L1C and Landsat-8 L1T data were reprojected to a common coordinate system. The equal-area sinusoidal projection and tiling scheme used to store the global Web Enabled Landsat (WELD) products were utilized. The global WELD products define monthly and annual 30 m Landsat nadir BRDF-adjusted reflectance (NBAR) surface reflectance derived by the algorithms described in [22,27,28] and are available at [29]. The global WELD tiles are nested within the standard 10° × 10° MODIS land product tiles and are defined in the sinusoidal equal projection used to store the MODIS land products [30]. Each global WELD tile covers about 159 × 159 km. There are 7 × 7 global WELD tiles within each MODIS land tile, and the filename includes the MODIS horizontal (0 to 35) and vertical (0 to 17) tile coordinates, and the nested WELD tile horizontal and vertical tile coordinates (0 to 6).

The Sentinel-2A L1C 10 m NIR (842 nm) and the Landsat-8 L1T 30 m NIR (864 nm) bands were reprojected independently into the global WELD tiles, each resulting tile was composed of 15885 × 15885 10 m pixels. The computationally-efficient inverse gridding approach [30,31] was used to systematically relate the locations of each 10 m global WELD tile pixel to the Sentinel-2A L1C and Landsat-8 coordinates. Care was taken to use the correct UTM zones defining the sensor data and to handle the different sensor pixel georeferencing schemes (Sentinel-2A uses pixel corner and Landsat-8 uses pixel center references). The General Cartographic Transformation Package (GCTP), developed by the United States Geological Survey, was used to transform coordinates between the sinusoidal and UTM map projections. The GCTP has been used to develop a number of applications including the MODIS global browse imagery [32] and the WELD products [28]. The Sentinel-2A L1C 10 m NIR data were resampled to 10 m by nearest neighbor resampling considering all the spatially overlapping L1C tile data. Although nearest neighbor resampling is computationally efficient as it assigns the closest pixel to any output coordinate location, it introduces local resampling shifts up to 0.5 of the input image pixel dimension. This was not an issue for the registration results because the sensor-to-sensor registration transformations were fitted globally, i.e., using a large number of tie points selected across each study image. However, the scale difference between the 10 m output grid and the 30 m Landsat data precluded reliable Landsat nearest neighbor resampling. Therefore, the Landsat-8 30 m NIR data were interpolated to 10 m by bilinear resampling [33], specifically, by fitting a hyperbolic paraboloid through the four neighboring 30 m pixel values and then interpolating the 10 m pixel value.

2.3. Test Data

The orbit and sensing geometry of Landsat-8 and Sentinel-2A are different, and consequently they acquire images over the same location at different times and on different days. In this study, two sites in South Africa, specifically over Cape Town (Figure 1) and over Limpopo province (Figure 2), were selected. Over Cape Town, a Landsat-8 L1T image sensed 22 November 2015 (for brevity referred to as week 47) and two Sentinel-2A L1C images, which were sensed 16 and 26 days later on 8 December and 18 December 2015 (referred to as weeks 49 and 51), were used to examine the robustness of the matching algorithm to surface, atmosphere and cloud changes. Over the Limpopo Province, the Landsat-8 L1T and Sentinel-2A data images were sensed just four days apart on 5 and 9 December 2015, respectively (both week 49). All the images contained cloud and/or topographic shadows to provide confidence that the matching could handle these phenomena. Both test sites include considerable (more than 500 m) changes in relief i.e., Table Mountain in Cape Town to the surrounding coastal plain, and the Southern parts of the Waterberg Massif to the Springbok Flats in the Limpopo study area.

3. Registration Method

3.1. Overview

The registration method was adapted from the hierarchical matching approach developed for application to Martian High-Resolution Imaging Science Experiment (HiRISE) satellite stereo images [21]. The approach in [21] has four main steps: (1) image pyramids, i.e., a hierarchy of low-pass filtered images from coarse spatial resolution to the native image resolution, are built for each image pair; (2) features are detected in one image at each pyramid layer; (3) area-based cross-correlation matching around the feature points is conducted sequentially from the coarse to the fine spatial resolution pyramid levels; (4) a set of tie-points defined at the native image spatial resolution are used, with knowledge of the sensor interior and exterior orientations, to derive a digital elevation model (DEM) using a dense grid-point matching guided by the tie-point locations.

The approach was adapted in this study (i) to handle the pairs of Sentinel-2A L1C and Landsat-8 L1T NIR data reprojected to the 10 m global WELD tiles (i.e., illustrated in Figure 1 and Figure 2, for brevity these are referred to below as Sentinel and Landsat image pairs); (ii) the area-based cross-correlation matching was replaced with the area-based least squares matching (LSM) that provides sub-pixel registration accuracy; (iii) a depth-first mismatch detection method was implemented that does not require knowledge of the sensors’ interior and exterior orientation parameters; (iv) the dense matching approach was not used to generate a DEM but rather to examine the spatial pattern of the registration prediction errors. The approach produces a set of corresponding tie-point locations that are defined in the Sentinel and Landsat image pair. The tie-points are then used to derive transformation functions that are used to reproject the Landsat image into registration with the Sentinel image. Figure 3 illustrates the processing flow. The implementation details are described below.

3.2. Image Pyramid Construction

A number of registration methods have been developed using image pyramids to improve the computational matching speed and sometimes to improve the match reliability [16,21,24,34]. The image pyramid approach was used in this study for both reasons and in particular because the depth-first mismatch detection method (Section 3.5) uses the multiple image pyramid levels to reduce mismatches. In this study, a four-level (120 m, 60 m, 30 m, 10 m) image pyramid was derived for the Landsat and Sentinel image pairs, applying a conventional low-pass Gaussian filter [35] to smooth the 10 m data to 30 m, 60 m and 120 m. The coarsest spatial resolution was defined at 120 m because the misregistration between the Landsat-8 L1T and Sentinel-2A L1C data was assumed to be smaller than this [10]. Four levels were used to ensure reliable depth-first mismatch detection described below.

3.3. Coarse Resolution 120 m Feature Point Detection

Feature-based matching techniques detect features, such as the intersections of linear structures and the centroids of distinct geometric objects, are often invariant to spectral, spatial, radiometric and temporal variations [36,37]. Consequently, they are commonly utilized for matching of satellite and other remotely sensed imagery [17,21,38,39,40]. Following the Li et al. approach [21], feature points were detected using the locally adaptive Förstner operator that typically detects corners, distinct points and centers of ellipsoid features [36]. The feature points were detected in the Sentinel-2A 120 m image as the Sentinel-2A data has higher native spatial resolution (10 m) than the Landsat-8 data (30 m). Each detected feature point was defined to the closest 120 m Sentinel-2A pixel. There were usually several thousand feature points detected from a 120 m Sentinel-2A image.

3.4. Least-Squares Area Based Image Matching

Area-based matching was undertaken by comparing the NIR reflectance values over a small square image patch (n × n pixels) surrounding each feature point location in the Sentinel image with the corresponding sized patch in the Landsat image. Rather than using conventional area-based cross-correlation matching [41,42], the least squares matching (LSM) approach was used [43,44]. The LSM approach models both geometric and radiometric transformations, which makes it suitable for matching images acquired on different dates and/or from different sensors, and also for matching images where the misalignment is not a simple translation [17,45]. LSM is a form of area-based matching and is undertaken between two small image patches by least-squares fitting of their reflectance values. It can provide sub-pixel matching accuracy up to 0.02 pixels depending on the image content [46]. Conventionally, a correlation coefficient, such as the normalized cross-correlation coefficient [42], is used to measure the similarity between one image patch (in the Sentinel image) and the corresponding image patch (in the Landsat image) which is resampled using the fitted geometric and radiometric transformation parameters. The least-squares fitting of these parameters is conducted iteratively, generating a new resampled patch each time, and checking that the correlation between the two patches increases after each new resampling, otherwise the iteration is terminated. The Landsat matched position corresponding to the center point of the Sentinel image patch is derived using the final fitted geometric transformation parameters. This produced matched pairs of Sentinel and Landsat pixel locations.

In the LSM implementation, the correlation coefficient was replaced by the spectral angle mapper (SAM) measure [47,48]. This was found to be helpful because of the spectral and temporal differences between the Sentinel and Landsat images. The SAM is insensitive to exogenous reflectance brightness variations as demonstrated using hyperspectral data [48,49,50] and multi-spectral multi-temporal data [51]. The SAM is conventionally derived between the spectral reflectance values of two pixels by calculating the angle subtended between their points in spectral feature space and the feature space origin (i.e., zero reflectance). In this implementation, the SAM was derived by comparing the n2 Sentinel NIR image patch pixel values with the corresponding n2 Landsat NIR values. For the South African test data in this study, the SAM was found to be more robust against threshold selection than the conventional normalized correlation coefficient, and a constant SAM threshold was sufficient for LSM matching for all four pyramid image levels.

3.5. Depth-First Mismatch Detection

Most matching algorithms incorporate strategies to remove mismatches that can occur, for example, in regions with repetitive structured terrain. A depth-first mismatch detection method was implemented that does not require knowledge of the sensors’ interior and exterior orientation parameters and that takes advantage of the sub-pixel matching accuracy provided by the LSM. The method is illustrated in Figure 4. Rather than undertaking mismatch detection independently on individual pyramid image levels, for example as in [21] or [24], the depth-first mismatch detection method utilizes the hierarchical pyramid image structure by considering the “depth” of a given feature point.

A scale-space approach [52] was used, based on the concept that correctly matched features should occur at all pyramid levels, and that at coarser spatial resolution, a smaller number of prominent features will be detected with less precise location than at higher spatial resolution. First, for each feature detected in the Sentinel 120 m data, the LSM matching was applied to find a corresponding location in the Landsat 120 m data. A check was undertaken to ensure a successful match. If the match was successful, then the Sentinel and Landsat (not shown in Figure 4) matched locations were projected to the 60 m level, and the LSM was repeated at these locations using the 60 m data. This matching-and-projection procedure was repeated to the 30 m and 10 m pyramid level data. If the match failed at any level, as illustrated by the red points in Figure 4, the locations were discarded. If the match was successful at all levels (illustrated by the two green points on the 10 m level in Figure 4), then the location in the Sentinel and Landsat data was recorded as a tie-point.

A match was considered successful if the SAM value was ≥0.995. This was a strict threshold as the SAM is bounded in the range [0, 1] and the SAM has a value of unity when the two image patches are identical. This was also helpful as cloudy 10 m pixels that are present in the 120 m, 60 m and 30 m pyramid levels will reduce the SAM values. Successful match locational criteria were also implemented. At the 120 m level, the difference between the Sentinel and Landsat locations was constrained to be less than 120 m to reduce matching blunders. At the 60 m, 30 m and 10 m levels, the Landsat LSM matched location was compared with the Landsat location projected from the level above, and if the Euclidean distance between the locations was greater than the one third of a pixel, it was also rejected. The LSM was undertaken considering 21 × 21 pixel patches in the Sentinel and Landsat images for the 120 m, 60 m and 30 m levels and a slightly larger 35 × 35 pixel patch for the 10 m level because of the loss of detail in the Landsat 10 m data due to its native 30 m resolution.

3.6. Transformation Coefficient Fitting

The above processes result in a set of corresponding pixel locations (i.e., tie-points) that are defined to the pixel in the Sentinel 10 m WELD tile and to sub-pixel precision in the Landsat 10 m WELD tile. These points were used to derive transformation functions used to reproject the Landsat WELD data into registration with the Sentinel WELD tile data. Conventionally, for moderately geometrically distorted imagery, polynomial functions that map the x and the y pixel locations of one image to the other image are derived by least-squares regression analysis of the tie-points. Conventional transformation functions were considered in this study to model translation, affine, and second order polynomial geometric distortions. Specifically: translations in both axes (Equation (1)), translation, rotation, and scaling in both axes and an obliquity (Equation (2)), and the same terms as Equation (2) with torsion and convexity in both axes (Equation (3)) [53].

xL=a0+xsyL=b0+ys

(1)

xL=a0+a1xs+a2ysyL=b0+b1xs+b2ys

(2)

xL=a0+a1xs+a2ys+a3xs2+a4xsys+a5ys2yL=b0+b1xs+b2ys+b3xs2+b4xsys+b5ys2

(3)

where xL, yL are the Landsat x and y WELD tile pixel coordinates, xs, ys are the Sentinel-2A WELD tile pixel coordinates, and ai and bi are the transformation coefficients derived by ordinary least squares regression of the tie-points found in the Landsat and Sentinel WELD tiles.

3.7. Image Registration

The pixel coordinates of each global WELD Sentinel tile pixel (xs, ys) were systematically projected into the global WELD Landsat tile coordinates (xL, yL), as Equations (1) and (2) or Equation (3), and then the coordinates (xL, yL) were reprojected using GCTP into the Landsat-8 L1T UTM map projection (Section 2.2). In this way, the Landsat-8 Level 1T data were only resampled once. The Landsat-8 L1T data were bilinear resampled to produce a new 30 m Landsat-8 WELD tile product that was registered with the corresponding Sentinel-2A WELD tile product.

4. Registration Assessment

4.1. Tie-Point Misregistration Assessment

The WELD tile x and y axis differences between the Landsat and Sentinel tie-point locations were derived as Equation (4). The differences were summarized quantitatively (minimum, maximum, mean and standard deviation) to characterize the sensor misregistration in units of 10 m pixels.

Δx,m=xS,m−xL,mΔy,m=yS,m−yL,m

(4)

where Δx,m and Δy,m are the WELD tile x and y axis differences between the Landsat tie-point location xL,m,yL,m and the corresponding Sentinel tie-point location xS,m,yS,m for tie-point m. The differences calculated by Equation (4) were visualized as vectors to examine the spatial distribution of the tie-point locations and the spatial pattern of the sensor misregistration. Note that, following standard convention, the image origin is defined as the top-left, i.e., the North-West, of each WELD tile, with +x to the East (i.e., to the right in the sample direction) and +y to the South (i.e., downwards in the line direction).

4.2. Transformation Coefficient Fitting Assessment

The capability of the different transformation functions (Section 3.6) to accurately fit the tie-point locations was assessed by consideration of the root-mean-square error (RMSE) as Equation (5).

RMSE=∑m=1n(xL,m−x^L,m)2+(yL,m−y^L,m)2n−t

(5)

where RMSE is the root-mean-square error (units 10 m pixels), xL,m,yL,m is the Landsat x and y WELD tile pixel coordinate for tie-point m, and x^L,m,y^L,m are the transformed Landsat x and y WELD tile pixel coordinate for tie-point m calculated by Equations (1) and (2) or Equation (3), n is number of tie-points, and t is the number of coefficients in the transformations, i.e., set as two for Equation (1), six for Equation (2) and twelve for Equation (3). Given n >> t, then smaller RMSE values are indicative of more accurate transformation function fitting.

For the Cape Town data (Figure 1), the temporal stability of the transformations was assessed by comparison of the transformations calculated for the two image pairs. Specifically, the pixel coordinates of the Cape Town WELD tile were systematically projected using the transformations derived from the Landsat-8 (week 47) and Sentinel-2A (week 49) tie-points, and also using the transformations derived from the Landsat-8 (week 47) and Sentinel-2A (week 51) tie-points, and then the Euclidean distance between them was computed as Equation (6).

dk=(xk47→49−xk47→51)2+(yk47→49−yk47→51)2

(6)

where dk is the Euclidean distance (units 10 m pixels) for projected WELD tile pixel k,(xk47→49,yk47→49) is the transformed pixel k location calculated as Equations (1) and (2) or Equation (3) for Landsat-8 (week 47) to Sentinel-2A (week 49) tie-points, and (xk47→51,yk47→51) is the location calculated with Equations (1) and (2) or Equation (3) for Landsat-8 (week 47) to Sentinel-2A (week 51) tie-points. Summary statistics of dk for all the 15885 × 15885 10 m WELD tile pixels were derived independently for each of the three transformations as Equations (1) and (2) or Equation (3).

4.3. Qualitative Visual Registration Assessment

Visual comparison of the registered Landsat and Sentinel data was undertaken using an approach similar to that used to visualize MODIS geolocation performance [30]. False color images composed of the NIR bands of the registered data were generated for spatial subsets containing high contrast features. The Sentinel NIR data were shown in red and the registered Landsat NIR data were shown in the blue and green bands. In this way, any sensor misregistration is exhibited by red and cyan tones over high contrast features and as greyscale tones elsewhere. The 10 m NIR Sentinel data were bilinear resampled from 10 m to 30 m, so they could be compared directly with the registered 30 m Landsat data.

4.4. Dense Grid-Point Matching Registration Assessment

The tie-point fitted transformations could be used to guide the least squares matching on a dense grid, and then the match results were used to locally register the Landsat and Sentinel data. This is not proposed as a practical registration solution, however, as it is computationally expensive and because spatial gaps between densely matched locations may remain due to, for example, clouds in either image. However, this approach was used to examine the Landsat and Sentinel misregistration in detail. Grid points were sampled every six 10 m pixel locations in the x and y axes across the 15885 × 15885 10 m WELD tile. At each grid point location, the corresponding location in the Landsat 10 m WELD tile was transformed as Equations (1) and (2) or Equation (3) and then used to guide the least squares matching between the Landsat and Sentinel WELD 10 m NIR data. Similar mismatch rejection as described in Section 3.5 was undertaken, using the SAM 0.995 threshold and setting successful match locational criteria as 1.0 pixel for transformations as Equations (2) and (3) and as 1.6 pixels for transformation Equation (1). These successful match locational criteria thresholds were set as approximately three times the transformation RMSE values (i.e., Equation (5) results described in Section 5). Maps of the prediction error defined as Equation (7) were derived for the three transformations.

ei=(xL,i−x^L,i)2+(yL,i−y^L,i)2

(7)

where for each WELD tile dense-matching grid point i,ei is the prediction error (units 10 m pixels), (x^L,i,y^L,i) is the transformed Landsat x and y WELD tile pixel coordinate calculated by Equations (1) and (2) or Equation (3), and (xL,i,yL,i) is the least squares matched Landsat x and y WELD tile pixel coordinate.

5. Results

5.1. Cape Town

5.1.1. Tie-Point Misregistration Assessment

A total of 1346 and 2351 feature points were detected from the Sentinel-2A 120 m week 49 and 51 images, respectively. Fewer feature points were available in the Sentinel-2A week 49 120 m image because of the greater cloud cover (Figure 1 bottom row) and because of the Gaussian smoothing of the cloud edges that reduced the availability of high contrast features. After the depth-first mismatch detection process, a total of 116 tie-points were defined between the Landsat-8 week 47 and the Sentinel-2A week 49 image pair, and 797 tie-points were defined between the Landsat-8 week 47 and Sentinel-2A week 51 image pair. Figure 5 illustrates the locations of the tie-points for the two pairs of images (red and green vectors). The tie points were found across the tile except in regions where cloud occurred in one or both images, and the vectors were highly consistent. For both image pairs, the Landsat-8 image was misaligned in a similar south-west direction relative to the Sentinel-2A images.

Table 1 summarizes the tie-point differences illustrated in Figure 5. For both image pairs, the x-axis and y-axis mean shift magnitudes are greater than 5.2 and 2.1 pixels with standard deviations of about 0.4 and 0.3 pixels, respectively. These 10 m pixel shifts are not insignificant and even at the 30 m Landsat pixel resolution will limit the ability to meaningfully compare Landast-8 and Sentinel-2A data.

5.1.2. Transformation Coefficient Fitting Assessment

Table 2 and Table 3 summarize the transformation coefficients derived by ordinary least squares regression fitting of the 797 and 116 tie-points extracted from the Cape Town image pairs (Figure (5)) for Equations (1)–(3). The transformation fittings’ RMSE values (Equation (5)) are also tabulated. As there are an order of magnitude more tie-points than transformation coefficients, the RMSE values are indicative of the transformation fitting accuracy. The translation transformation (Equation (1)) is the least accurate with RMSE values greater than 0.5 pixels for both image pairs. The affine (Equation (2)) and second order polynomial (Equation (3)) transformations have small RMSEs ranging from 0.286 to 0.302 pixels, which indicate a high level of reliability of the tie-points obtained by the depth-first LSM and also that these transformations are sufficient to model the Landsat-to-Sentinel misregistration. The RMSEs are not much different for the affine and second order polynomial transformations. This suggests that the geometric complexity provided by the second order polynomial transformation will not provide a notable improvement to the registration accuracy.

The transformation coefficients for the two image pairs are quite similar (Table 2 and Table 3). To investigate this, the Euclidean distances (Equation (6)) between the projected WELD tile pixel locations projected using the Table 2 and Table 3 coefficients were assessed for all the 15885 × 15885 10 m WELD tile pixels (Table 4) as described in Section 4.2. For the translation transformation, there is a constant 0.215 pixels difference because the translational differences are constant relative to each other across the WELD tile. The affine and second order polynomial transformations provide spatially variable Euclidean distances. Compared with the second order polynomial transformation, the affine transformation has smaller minimum, maximum, mean and standard deviation values of 0.000, 0.266, 0.114 and 0.055 pixels versus 0.002, 1.332, 0.245 and 0.257 pixels, respectively. Evidently, the second order polynomial transformation fitting is less stable than the affine transformation fitting for the image pairs and tie-points considered. This and the similar transformation fittings’ RMSE values (Table 2 and Table 3) suggest that the affine transformation is appropriate for registration of the sensor data.

5.1.3. Qualitative Visual Registration Assessment

Figure 6 illustrates the result of the Landsat-8 to Sentinel-2A registration undertaken using the affine transformation (Table 3). Sensor misregistration effects are quite apparent in Figure 6a, especially in the WELD x-axis orientation, which is reflected by the relatively larger absolute mean Δx than Δy tie-point values (Table 1). After the registration (Figure 6b), the sensor misregistration effects are considerably less pronounced. Surface changes that occurred between the Landsat-8 and Sentinel-2A acquisition dates are apparent in the registered data notably, such as differences in the beach profile on the south-east side of the harbor prominentary, the presence of clouds, and perhaps ship tracks in one image and not the other.

5.1.4. Dense-Matching Prediction-Error Assessment

Figure 7 shows the dense-matching prediction-error maps for the translation, affine and second order polynomial transformations between the Landsat-8 week 47 and Sentinel-2A week 51 image pair. The areas with no matched points or satellite observations are shown in black. The black “holes” are predominantly due to water bodies and clouds in either image, and also due to surface changes that caused matches to be rejected. The prediction errors for the translation (Figure 7a) are greater than those for the affine (Figure 7b) and second order polynomial (Figure 7c) transformations. The affine and second order polynomial transformations prediction errors have similar magnitude and spatial pattern. Their greatest differences occur in the south-west and this is likely due to the lack of tie-points in this region (Figure 5). This and the stripes that are aligned approximately parallel to the Sentinel-2A and Landsat-8 track directions are discussed further in Section 6. The prediction-error maps obtained considering the other Cape Town image pair are not illustrated but were visually similar except for the spatial distribution of the “holes” due to image differences (cloud cover and surface changes).

Table 5 summarizes the dense-matching prediction errors illustrated in Figure 7. The mean and standard prediction errors are tabulated. The maximum error is the same as the successful match locational criteria (Section 4.4) and the minimum error was zero pixels (to decimal places). The prediction errors are greater for the translation transformation, which is expected due to the relatively less accurate translation model fits (Table 2 and Table 3) and is reflected in the prediction error maps illustrated in Figure 7. The prediction errors are similar for the affine and second order polynomial transformations with means and standard deviations of less than 0.3 and 0.2 pixels, respectively. The errors for the Landsat-8 week 47 and Sentinel-2A week 49 image pair are higher and more variable than for the Landsat-8 week 47 and Sentinel-2A week 51 image pair, and this is likely due to the smaller number of tie-points used.

5.2. S.W. Limpopo

5.2.1. Tie-Point Misregistration Assessment

For the Limpopo study site, only one pair of images was considered (Figure 2). A total of 1840 feature points were detected from the Sentinel-2A 120 m week 49 image, and after the depth-first mismatch detection process, 180 tie-points were defined (Figure 8). The tie-points were most densely distributed in less cloudy agricultural and urban areas where the image intensity contrast was pronounced. The Landsat-8 image was misaligned in a south-east direction relative to the Sentinel-2A image. Not only was the misalignment orientation different from the Cape Town image pairs, but also the magnitude of the tie-point differences was different (Table 6). The x-axis and y-axis mean tie-point shifts were smaller with magnitudes greater than 1.5 and 1.7 pixels, respectively, but with similar standard deviations (Table 6) to those found for Cape Town (Table 1).

5.2.2. Transformation Coefficient Fitting Assessment

Table 7 summarizes the transformation coefficients and RMSE values. As for the Cape Town image pairs, the translation transformation is the least accurate (RMSE > 0.5 pixels) and the affine and second order polynomial transformations have similar RMSEs of 0.303 and 0.309 pixels, respectively, and so the affine transformation is deemed appropriate for registration of the sensor data.

5.2.3. Dense-Matching Prediction-Error Assessment

As for the Limpopo image pair, the dense-matching prediction errors for the translation transformation (Figure 9a) are greater than those for the affine (Figure 9b) and second order polynomial (Figure 9c) transformations. The affine and second order polynomial transformations prediction errors have similar magnitude and spatial pattern. There are stripes aligned parallel to the sensor track directions, especially evident in Figure 9a, that are discussed further in Section 6. Similar to the Cape Town prediction error statistics (Table 5), the prediction errors are greater for the translation transformation, and are similar for the affine and second order polynomial transformations (Table 8) that have means and standard deviations of about 0.21 and 0.17 pixels, respectively.

6. Discussion

Registration of satellite data to sub-pixel precision is a pre-requisite for meaningful data comparison and surface change detection [54,55]. The registration tie-points were derived by least squares matching, which provides sub-pixel location precision and handles the sensor spectral band differences and the non-linear geometric distortions present between the Landsat-8 and Sentinel-2A data. The tie-points were used to fit translation, affine and second order polynomial transformation functions, and for the three pairs of sensor data (different dates and locations), the translation transformation was less accurate with RMSE fit values greater than 0.5 pixels and greater dense-matching prediction errors. The affine and second order polynomial transformations had RMSE fit values of approximately 0.3 pixels and dense-matching prediction errors of similar magnitude. However, the tie-points derived from the two Sentinel-2A Cape Town images and the same Landsat-8 Cape Town image (Figure 5) provided less stable transformations relative to each other when the second order polynomial rather than the affine transformation was used. This is likely because of the greater sensitivity of the polynomial model to the different numbers and spatial distributions of tie-points [18,56] (Figure 5). These results suggest that an affine transformation is sufficient to register Landsat-8 L1T and Sentinel-2A L1C data when there are considerably more tie-points than transformation coefficients.

The causes of the observed sensor misregistration are complex. For both Cape Town image pairs, the Landsat-8 image was misaligned in a similar south-west direction relative to the Sentinel-2A images, and for the Limpopo image pair, the Landsat-8 misalignment was in a relative south-east direction. The relatively constant geographical pattern of tie-point-characterized sensor misregistration (Figure 5 and Figure 6) supports the hypothesis that the misregistration is due primarily to Landsat GLS path/row specific locational errors [9,15]. However, other sources of error may be present, including, for example, geometric relief distortion imposed by digital elevation model inaccuracies, although we observed no elevation related shifts, and inadequate knowledge and/or modeling of the sensor interior and exterior orientation. The detailed study on these error sources is beyond the scope of this paper.

The dense-matching prediction-error maps (Figure 7 and Figure 9) exhibited stripes that are aligned approximately parallel to the Sentinel-2A and Landsat-8 track directions. To investigate this further, maps of the x and y axis shifts used to compute the dense-matching prediction errors (Equation (7)) were generated. The results revealed similar patterns for all the study data. Figure 10a,b show the dense-matching x and y shifts for the Limpopo image pair. The mean x-shift and y-shift values were −1.592 and −1.674 pixels, respectively, which is a similar magnitude to the fitted translation transformation coefficients (Table 7). However, the shifts were unevenly distributed across the tile, with x-shift and y-shift standard deviations of 0.473 and 0.237 pixels, respectively. This illustrates why the translation transformation had larger errors than the other transformation types. Figure 10c shows the Sentinel-2A L1C tile and detector boundaries and Figure 10d shows the Landsat-8 L1T image boundaries. The dense-matching x and y axis shift maps have apparent zones with edges aligned approximately parallel to the detector and image boundaries. The x and y shifts across the zone boundaries are quite small however, usually less than 0.35 pixels (manually measured). Their cause is likely due to a combination of factors. Small geometric misalignments and/or radiometric calibration differences between the detector banks combined with directional reflectance affects may result in least squares matching differences between the sensor data that will be pronounced along the detector bank boundaries. Along scan directional reflectance variations of several percent are present in 15° field of view Landsat data [57] and are expected to be greater in wider field of view (20.5°) Sentinel-2 data. Consequently, different sensor viewing geometry and forward and backward scatter sensing conditions may introduce along scan reflectance variations that cause small least squares matching shifts. However, we note that for the data considered, the solar zenith angles were only a few degrees different and the images were sensed in the same scattering direction. The along track shifts evident at the bottom left of Figure 10 are coincident with the Landsat-8 L1T boundary, evident in Figure 10d, and may be caused by a different set of Landsat ground controls being used.

Finally, as with all registration methods, the accuracy will be dependent upon the availability of tie-points which will be limited in regions of unstructured terrain and where there are clouds and shadows. However, if the sensor misregistration is predominantly constant for each Landsat WRS path/row location, i.e., due primarily to GLS path/row locational errors [9,15], then tie-points extracted from matching many pairs of approximately contemporaneous Landsat-8 and Sentinel-2A data through time may provide a reliable set of tie-points, from either a “good” pair or a combination of multiple pairs. This will require further assessment on sensor’s multi-temporal intra-misregistration.

7. Conclusions

This study presented an approach for the automated registration of geolocated Landsat-8 OLI L1T and Sentinel-2A MSI L1C data. Registration errors between South African test data greater than the native sensor pixel resolutions were found and were effectively characterized by an affine transformation. Tie-points extracted from pairs of Landsat-8 L1T and Sentinel-2A L1C images revealed sensor misregistration greater than one 30 m Landsat-8 pixel dimension (for two Cape Town image pairs) and greater than one 10 m Sentinel-2A pixel dimension (for Limpopo image pair). This degree of registration error is not insignificant, and will limit the ability to meaningfully compare Landast-8 and Sentinel-2A data even at the 30 m Landsat-8 pixel resolution. As the degree of sensor misregistration is unknown at other global locations and times, further study considering a global distribution of Landsat-8 and Sentinel-2A contemporaneous data and considering the causes of the sensor misregistration is recommended.

The developed registration approach is computationally efficient because it implements feature point detection at reduced spatial resolution and then area-based least squares matching around the feature points with mismatch detection across four image pyramid levels to identify a sparse set of tie-points. Nevertheless, the registration of large spatial and temporal coverage of Landsat-8 and Sentinel-2A data will still be computationally challenging because of the considerable satellite data volume imposed by the 10 m Sentinel-2A bands. The performance of the approach on the selected test data, which contained clouds, shadows, land cover changes, and sensor acquisition date differences of up to 26 days, provide confidence in its robustness. The approach provided hundreds of automatically located tie-points that had least-squares fits with RMSE of 0.286, 0.302 and 0.303 10 m pixels under affine transformation for the three registered Landsat-8 and Sentinel-2A image pairs. Dense-matching prediction-error assessment considering every sixth pixel on a systematic grid revealed sub-pixel prediction errors (means < 0.3 pixels and standard deviations < 0.2 pixels at 10 m pixel resolution) for the fitted affine transformations. These results and visual assessment of the affine transformed data indicate the sub-pixel registration performance required for meaningful sensor data comparison and time series applications [54,55] including those illustrated in this special issue [58].

Acknowledgments

This research was funded by the NASA Land Cover/Land Use Change (LCLUC14-2): Multi-Source Land Imaging Science Program, Grant NNX15AK94G.

Author Contributions

L.Y. developed the algorithm and undertook the data analysis, and processed the data and developed the graphics with assistance from H.K., J.L. and H.H.; D.R. helped with the data analysis and structured and drafted the manuscript with assistance from L.Y.

Figure 10.
Dense-matching maps x and y axis shifts (units 10 m pixels) between the Limpopo Landsat-8 week 49 and Sentinel-2A week 49 image pair. The affine transformation was used to guide the dense matching (Section 4.4). The (a) x-shift xS,i−xL,i and (b) y-shift yL,i−yL,i are shown, where (xS,i,yS,i) is the Sentinel grid-point location and (xL,i,yL,i) is the corresponding least squares matched Landsat location for grid-point i. Locations where there are no matches are shown as black. Note that (xL,i,yL,i) is theoretically independent on the transformation type and so the translation and polynomial-based shift maps, which were very similar to the affine-based shift maps, are not shown; (c) shows the Sentinel-2A L1C tile and detector boundaries (red) and (d) shows the Landsat-8 L1T image boundaries (blue).

Figure 10.
Dense-matching maps x and y axis shifts (units 10 m pixels) between the Limpopo Landsat-8 week 49 and Sentinel-2A week 49 image pair. The affine transformation was used to guide the dense matching (Section 4.4). The (a) x-shift xS,i−xL,i and (b) y-shift yL,i−yL,i are shown, where (xS,i,yS,i) is the Sentinel grid-point location and (xL,i,yL,i) is the corresponding least squares matched Landsat location for grid-point i. Locations where there are no matches are shown as black. Note that (xL,i,yL,i) is theoretically independent on the transformation type and so the translation and polynomial-based shift maps, which were very similar to the affine-based shift maps, are not shown; (c) shows the Sentinel-2A L1C tile and detector boundaries (red) and (d) shows the Landsat-8 L1T image boundaries (blue).