Abstract

Russian sea ice charts digitized by the Arctic and Antarctic Research Institute (AARI) in the Sea Ice Grid (SIGRID) format are being regridded to the Equal Area SSM/I Earth Grid (EASE-Grid). Several methods of regridding are considered: nearest neighbor, drop-in-box, point min/max, interpolation, area min/max and area weighted averages. These methods are evaluated for their effect on data content, computer resource requirements, consistency of results, and how they deal with the descriptive and symbolic nature of SIGRID data. Two of these methods, nearest neighbor and area min/max, are identified as most appropriate to apply for regridding AARI data.

The nearest neighbor method is straightforward and not computationally demanding, but results in data loss and duplication. The area min/max approach is more complex, and requires more computer resources, but results in EASE-Grid images created from all the AARI data points. In testing the methods, errors were found in the digitization of the sea ice charts by AARI. Some of the charts were digitized with a starting longitude that causes data to be recorded at differing locations on the charts. The regridding routines were designed to work with data at consistent locations. However, by oversampling the data at 0.25 degree spacing, the erroneous data points may be transferred to the EASE-Grid projection.

Future work remains to select between the nearest neighbor and area min/max regridding approaches. These methods should be applied to a representative set of AARI data from the east and west sectors and the results evaluated and compared. If necessary, the approach using oversampled AARI data needs to be refined. Furthermore, quality control procedures, subsetting of the full EASE-Grid to polar regions, and code optimization need to be implemented before production of EASE images of the AARI data can be completed. [Ed. note: Subsequently, EASE-Grid nearest neighbor regridding with 12.5 km grid cell size was chosen.] See (NSIDC Special Report 6).

1. Introduction

This document describes the process of regridding sea ice data obtained from Arctic and Antarctic Research Institute (AARI) in the SIGRID format to the Equal Area SSM/I Earth Grid (EASE-Grid) northern hemisphere projection. The many options for the regridding process are presented along with the advantages and disadvantages of these options. In addition, some statistics and examples of regridding SIGRID data are presented. This document is intended to help define the optimum method for the regridding process of AARI data. Although the criteria for selecting the best regridding technique varies from project to project, the general discussion in this document may help in selecting the method of regridding for other data sets at NSIDC.

2. Description of Grids

This section gives a general description of the SIGRID and EASE-Grid grids. Only general information and details relevant to the regridding process is presented. For more detailed descriptions consult the references for SIGRID (1) and EASE-Grid (2-5). The work conducted on this project was with northern hemisphere data.

2.1. AARI Implementation of SIGRID

SIGRID was developed for the World Meteorological Organization. It was designed to place information recorded on operational ice charts into a digitized format more suitable for statistical and climatological use. This was done by assigning numerical values to the ice parameters and recording them at specific grid points on the charts. As shown in Figure 1, the value at each grid point is representative for the ice conditions of the grid cell area around the point. Thus, only the value of the sea ice parameter at the grid point is recorded, even if the majority of the grid cell area has other values.

The SIGRID codes for ice form, stage of development, and concentration are listed in Tables 1-3:

Table 1

Table 2

Table 3

SIGRID data values can indicate an ice form of, say, icebergs (see Table 1) or a stage of ice development of, for instance, old ice (see Table 2). Essentially, all the data values of SIGRID are symbolic. The numbers are codes for values or an range of values for the ice parameters. For example, an AARI ice concentration code of 90 refers to 90% ice concentration and a value of 91 refers to the range of concentrations greater than 90% and less than 100% (see Table 3). The descriptive and symbolic nature of AARI data affects regridding approaches that combine data values (e.g., interpolation or averaging methods).

The SIGRID format uses a "geographical" grid. Essentially, this type of grid defines grid point locations according to latitude and longitude spacing. According to observations of the AARI implementation of SIGRID, the latitude spacing of the grid points is constant at 0.25°. The longitude spacing, in contrast, varies with latitude (Table 4). The longitude spacing in degrees of longitude is determined by multiplying the latitude spacing by a lon/lat ratio. The value of the ratio is different for each of eight defined latitude regions of the grid. As shown in Table 4, grid points at higher latitude regions have greater spacing in longitude. By selecting an initial latitude and longitude, the grid points are determined. If the initial latitude and longitude are not consistently selected to always yield the same grid points, the regridding procedures will give different results for files that have improperly selected initial points. This is discussed in detail in Section 5.

For moving data between these grids, it is important to look at how the grid points are spaced on the surface of the earth. To do so, the surface of the earth as a sphere with radius Re = 6378 km was approximated. Using the information in Table 4, the spacing (in kilometers) of the grid points along meridians (lines of latitude) and parallels (lines of longitude) was determined. The constant 0.25° spacing in latitude translates to a 27.83 km spacing on the Earth's surface.

The longitude spacing, in contrast, is more complex and varies non-linearly with latitude. As shown in Table 4, within each latitude region, the spacing decreases as latitude increases (the spacing varies with the cosine of the latitude). At the boundaries of the regions, where the lon/lat ratio changes, the longitude spacing increases abruptly (i.e., at 50° the spacing is 17.89 km; in the next region at 50.25° latitude, the spacing jumps to 35.59 km). These characteristics are evident in Figure 2 a and b.

Like the spacing on parallels,the area of each SIGRID cell changes with latitude. Table 4 and Figure 3 show this variation.

2.2. EASE-Grid

The EASE-Grid projections were developed under the SSM/I Pathfinder program for use with SSM/I data. Three equal area projections were defined: a global, a northern hemisphere and a southern hemisphere. The EASE-Grid for the northern hemisphere is a Lambert Equal-Area Projection in north polar aspect. It is based on a spherical model of the Earth with radius Re = 6371.228 km. The important aspects of this projection are its equal area, the azimuthal property showing true direction from center of the projection, and its scale at a given distance from the center varies less from scale at the center than any of the other major azimuthal projections. Table 5 shows the change in scale factors with latitude.

The nominal cell size is 25.067525 km. In order to remain equal area, the latitude spacing increases and the longitude spacing decreases from the north pole to the equator to preserve a nominal cell area of 628.37956 km². The projection is shown in Figure 4 with coast lines and a graticule of 15° latitude and 30° longitude overlaid (black pixels). In this projection, the corner pixels are in the southern hemisphere (negative latitude) and are shown in grey. These pixels are not used to represent data and are considered blank or non-valid pixels.

2.3. Comparison of Grids

An important distinction between these grids is that the EASE-Grid is equal area and the SIGRID is not. Also of importance for transferring data from one grid to another is the disparate manner in which they have their grid points placed. The SIGRID has constant 0.25° (27.83 km) spacing along meridians, while the EASE-Grid varies from 0.318° at the equator to 0.225° at the pole (35.45 km at the equator to 25.0 km at the pole). Even more importantly, the SIGRID spacing along parallels decreases as latitude increases within the eight regions and has large step increases across boundaries of these regions. EASE-Grid, in contrast, smoothly increases from 0.159° (17.72 km) at the equator to 51.24091° (25.06746 km) one pixel below the pole (Figure 2).

The grid cell area of SIGRID follows the same pattern as its spacing along parallels, but the EASE-Grid area, of course, remains constant in each cell (Figure 3).

Another important factor in regridding is that the AARI implementation of SIGRID has a total possible number of 375,084 digitized points. The EASE-Grid projection has 405,845 grid cells. Thus, there are 30,761 more grid cells in EASE-Grid. For a resampling method which uses only one SIGRID value to determine a value in EASE (e.g., nearest neighbor) there will be some amount of SIGRID data that must be repeated in order to fill the entire EASE grid.

3. Regridding Process

This section describes several methods for representing the AARI data in the EASE-Grid northern hemisphere projection. The section includes examples of regridding AARI data; however, some gridding methods (e.g., area min/max, interpolation, and area weighted averages) have not been evaluated yet. Any of these options that seem appropriate will be tested in the future before a decision is made on the optimum regridding approach.

3.1. Gridding Direction

Because the origin of the data is the SIGRID format, the process of starting with AARI data points and placing them into EASE grid cells as the "forward" direction is defined. The "inverse" direction is defined as starting with empty EASE grid cells, calculating where these cells lie on the surface of the Earth with respect to the SIGRID grid points, and determining the value to place in the EASE-Grid cell from these surrounding points.

3.1a. Forward regridding (AARI to EASE)

Initially, implementing the regridding process in the forward direction was considered. The AARI grid points were "dropped" onto the EASE-Grid (i.e., using the latitude and longitude of the AARI grid points, it was determined which EASE grid cell contained the point). Because of the disparity in grid point spacing (as discussed in Section 2.3), the AARI grid points did not fall uniformly into the EASE grid (Figure 5). The image demonstrates how the different grids compare. At the start of each of the AARI latitude regions, the AARI grid spacing on parallels is greater than the EASE spacing, thus there are a greater proportion of empty (white) cells. In contrast, at the end of these regions the AARI spacing is less than EASE and there are more AARI grid points and, thus, a greater number of overfilled (blue) cells.

Table 6 shows the number and percentages of AARI points in EASE grid cells. Ideally, the more one to one correspondence, the less data manipulation may have to take place in the regridding process. Over 68% of the EASE grid cells contain exactly 1 SIGRID grid point. However, nearly 20% of the EASE grid cells are un-filled (zero SIGRID points). Furthermore, over 11% of grid cells have 2 or more SIGRID points (over-filled). The empty cells present a problem for the forward regridding since the data presented to the user in the EASE-Grid projection would always contain some empty (un-filled) cells. Likewise, the over-filled cells necessitate a decision making or calculation to decide the value for the grid cell. As shown in the next section, the inverse approach has the advantage of having every grid cell in EASE assigned a value based on the SIGRID data. The method of determining the value presented in EASE-Grid can range from a simple nearest-neighbor approach to a more complicated area weighted average calculation. For these reasons, using inverse regridding is suggested.

3.1b. Inverse regridding (EASE from AARI)

The inverse approach is made up of two steps: 1) locating each EASE grid cell in relation the the digitized SIGRID grid points, and 2) establishing a value for the EASE grid cells based on the values of the surrounding SIGRID grid points. This section discusses the first step and a following section, Section 3.3 Gridding Procedures, addresses the several options for the second part.

The locations of the EASE grid cell centers were determined using the "easeconv.pro" IDL routine. This routine establishes the center latitude and longitude for each pixel in the EASE-Grid northern hemisphere projection. Then, the center locations are compared to the locations of the SIGRID digitized grid points.

In order to visually represent the SIGRID digitized points a rectangular image is used (Figure 6). The image has a y-dimension (rows) of 361 pixels (90/0.25 + 1) and x-dimension (columns) of 1440 pixels (360/0.25). At latitudes greater than 50° (i.e., rows greater than 201), not all of the pixels represent a SIGRID digitized point. This is because the SIGRID format changes spacing from every 0.25° in longitude in the 0-50° latitude region to every 0.50° in longitude in the 50.25-70° latitude region (see Section 2.1 AARI Implementation of SIGRID). Thus, only every other pixel in the x-dimension represents an AARI digitized point. Likewise, at higher latitude regions there are increasingly fewer digitized points (light gray pixels).

3.2 Gridding Concepts (grid points and grid cells)

In dealing with the AARI digitized data the concepts of grid points and grid cells need to be defined. AARI values of sea ice concentration, ice form, etc. are recorded at specifically defined SIGRID grid points. These grid points have exact locations at latitude and longitude coordinates discussed in Section 2.1 and defined in the document proposing SIGRID(1). Furthermore, AARI considers the value at these grid points to apply to the associated grid cells. The area covering half way between the previous and next point in latitude and the previous and next point in longitude is the grid cell (see Figure 1). Thus, during the AARI digitization process, the grid cell on the chart may contain several values for sea ice concentration, but only the value at the grid point is recorded. Subsequently, we regard this value to apply to the entire grid cell area.

Whether the concept of grid points or grid cells is used affects the regridding process. A method which considers both the original (AARI) and target (EASE) grids in terms of grid points is the nearest neighbor resampling (Section 3.3a). This concept is simple and computationally efficient but may not represent the entire original data volume in the final grid. Similarly, the drop-in-box method (Section 3.3b) is simple and efficient but, however, treats the original grid as grid cells and the destination as grid points. Again, this method results in some data loss. The point min/max (Section 3.3c) and interpolation (Section 3.3d) methods perform some computations or decision making that reduces data loss. These methods are more complex and require more computation time. The most complex methods presented here, area min/max (Section 3.3e) and area weighted averages (Section 3.3f), deal with both AARI and EASE grid cells and the analysis of overlapping grid cell areas.

In addition to data loss and computational efficiency, the symbolic nature of the AARI data has an effect on the results of the various regridding approaches and impacts the selection of the optimum method. Especially affected are the more complex methods that use more than one SIGRID value to determine a value in EASE-Grid. Furthermore, unknown values and empty grid cells can change the implementation and results of interpolation and averaging methods.

3.3 Gridding Procedures (determination of "value" to place in EASE grid cells)

The second step in the inverse regridding, once the locations of EASE grid cells with respect to the digitized SIGRID points are known, is determining the data value for each of the EASE grid cells. The method of determining the value can be as simple as selecting the value of the closest SIGRID point (nearest neighbor) to a more complex averaging of the surrounding values using a weighting based on the amount of SIGRID cell area falling in the EASE grid cell (area weighted averaging). In this section, examples of some of the methods show the starting AARI data values in a SIGRID rectangular image (Figure 6) and the resulting EASE image. Furthermore, the advantages and disadvantages of each approach are discussed.

3.3a Nearest Neighbor

The most straightforward method of determining the value to place in the EASE grid cells is to assign it the value of the closest SIGRID point. An IDL routine was written to establish the nearest neighbors for each EASE grid cell. The routine loops through the EASE grid pixels, first establishing their center latitude and longitude, then computing the distances to surrounding SIGRID points, and finally selecting the closest SIGRID point using distance (arc length) on the surface of the Earth.

Thus, for each EASE grid pixel, the closest SIGRID point is determined and its location (row and column) in the SIGRID rectangular array is stored in a file. In special cases when two or more SIGRID points are equidistant at the minimum distance, the grid point at the lower latitude and longitude is selected as the nearest neighbor. This convention is consistent with the drop-in-box regridding, see Section 3.3b.

The routine is only executed once to establish the nearest neighbors (program execution time is 4 hours). Subsequently, a different routine uses the row and column files to regrid AARI data files to the EASE projection. This is done simply by looping through the EASE pixels, reading the nearest neighbor AARI row and column from the stored files, looking in the AARI data file for the value digitized at this location, and assigning this value to the EASE grid cell.

An example is shown for the AARI file w900904.sigrid. Table 7 shows information extracted about the contents of the SIGRID file. Figure 7 shows the AARI total ice concentration data displayed in the 0.25° spaced rectangular array (all examples in this document are for total ice concentration). A close-up of the region with the most data is shown in Figure 8. Figure 9 shows the resulting full EASE-Grid image using the nearest neighbor resampling. Figure 10 is a close-up of the polar region.

In general, the data seems to transfer well between the grids. Coast lines are followed closely, but, in order to assess this quality control procedures are recommended (see Section 7.4 - Quality Control). As the table and figures show, the SIGRID file contains data from 65° to 90° north latitude. In Figures 7 and 8, the SIGRID data has 100% total ice concentration (red pixels) extending to the pole (top line of the image). However, in the EASE-Grid images, data above 87° is not shown. The reason for this is seen by closely examining the SIGRID figures. In the images, black pixels represent cells that should not contain data (non-digitized points) and the darkest grey pixels represent cells that may contain data for the SIGRID format but do NOT for this particular file. At high latitudes (greater than 87°), there are data values in non-digitized pixels and no data represented where it should be in the digitizable (darkest grey) pixels (top of Figure 8). This is due to improper selection of the initial longitude. Table 7 shows that AARI chose -24° longitude. However, for proper digitization, the starting longitude must be divisible by the longitude spacing of the the highest latitude region in the file. In this case, 90° is the highest latitude; thus, the initial longitude must be a multiple of 20°. If the highest latitude was in the 87.25-89° region, it would have to be a multiple of 5°. Thus, for the 87.25-89° region the proper initial longitude would be -25°. This difference is affirmed by noting the pixels in this region are offset by four pixels, or 1° (Figure 7 and Figure 8). Similarly, above 89.5°, the digitized values are misplaced by 4° (the difference between -20° and -24°).

This error in the digitizing is a problem since the data are no longer consistently located at the same points. Since the regridding was established assuming a proper starting longitude, when data are not recorded in the right locations they do not get regridded to the EASE-Grid projection. Possible ways to work around this problem are addressed in Section 5.

Similar to the analysis of how data points transfer in forward regridding (Section 3.1a), the number of times AARI grid points were selected as nearest neighbors to the EASE grid cell centers was considered. Table 8 shows this for each latitude region, for the entire grid, and for the polar region above 50° latitude. The percentages change in each region because the AARI and EASE grid spacings vary. For the full grid, approximately 10% of the possible digitized points are never selected as nearest neighbors. This means the data at these points does not get represented in the EASE projection. Thus, there is a 10% loss of data. Considering only the region above 50°, the data loss is 11.34%. In general, the data loss increases as the latitude increases because the AARI grid spacing is sometimes less than EASE (Figure 2). Thus, in these areas, when transferring data from SIGRID to EASE using nearest neighbor resampling, not all of the greater number of SIGRID points transfer to the fewer number of EASE grid cells.

In addition to the data loss, there is also an amount of data replication in the regridding. In Table 8, this is when an AARI point is used two or more times. This occurs when the EASE spacing is less than AARI. The amount of data replication in the full EASE-Grid is 18% for the nearest neighbor resampling.

In summary, the nearest neighbor regridding is conceptually simple, easily implemented, and computationally fast. However, the results do suffer from a certain amount of data loss and duplication in the final EASE-Grid image. Lastly, the errors in the starting longitude corrupt the regridding results at higher latitudes.

3.3b Drop-in-box

The drop-in-box method is another simple approach to regridding the AARI data. The value of the EASE grid cell is equal to the value of the SIGRID cell that contains the center position (latitude and longitude) of the EASE cell. As with the nearest neighbor resampling, the center position of the EASE cells are determined and an IDL routine determines in which SIGRID cells the centers fall. Once again, the SIGRID cell locations, rows and columns, are saved in files. Subsequently, another routine regrids the AARI file to the EASE-Grid.

Comparing the drop-in-box row and column files to those of the nearest neighbor regridding shows almost identical results. The only discrepancies are for 112 pixels along the edges of the eight latitude regions. At these points, the situation depicted in Figure 11 is encountered. When a new latitude region is started the SIGRID spacing along longitude increases (sometimes doubling the previous value) and an EASE grid center may fall near the edge of SIGRID cell. It's coordinates may, however, be closer to the SIGRID point below. Thus, for the 112 points where they differ, the drop-in-box method selects the SIGRID value one row above the nearest neighbor resampling.

In addition to giving results similar to the nearest neighbor approach, the drop-in-box resampling has the same characteristics with respect to errors in initial longitude, data loss and redundancy, and computational efficiency.

3.3c Point min/max

This method is similar to the drop-in-box method but in the forward gridding direction. The concept of the point min/max resampling is applied to forward regridding since more than one SIGRID point may fall into an EASE grid cell, as discussed in Section 3.1a. The point min/max regridding generates two EASE grid images for each AARI file: one with the maximum values and one with the minimum values. For grid cells where only one SIGRID point is present, the same information would be shown in both grids. However, the EASE images would still contain a significant number of empty cells, nearly 20% of the images (see Figure 5).

The two images combined would contain almost all of the information in an AARI file. The only exceptions would be for 16 EASE grid pixels which have three SIGRID point falling into them (see Table 6). In these 16 pixels one of the SIGRID points would not be presented in either the min or max image.

Another important aspect of this approach is that each image will not consistently show values that come from the same digitized location in SIGRID. For example, one time the max image may show the first of two SIGRID points that fall into an EASE grid cell. On another date, the second SIGRID point may have the highest value and be selected for the max image. Since the points are at different locations, the images from different dates may not be spatially consistent.

3.3d Interpolation

The interpolation resampling computes a value for the EASE grid cell by interpolating from the values of the four surrounding SIGRID points. This type of resampling makes use of all of the SIGRID data to construct the EASE images; thus, it avoids the data loss problem of the nearest neighbor and drop-in-box methods. The implementation would consist of saving lat/lon coordinates for the four associated interpolation points rather than the one nearest neighbor. In addition, the weighting factors would also be stored in files. These files would be generated one time to establish the details of the regridding. Subsequently, a program could be run to apply these to an AARI file in order to create an EASE image.

Despite the elimination of data loss by this method, there are two major disadvantages in applying this technique to AARI data. The first problem is that many AARI points in each file have missing and unknown values. Thus, instead of interpolating an EASE grid cell value from four points, the value may be calculated from three, two or one AARI value. This requires additional code in the regridding routine to handle each case and causes an increase in computation time. For instance, in dealing with AARI values of 70%, 60%, 80% and 100% sea ice concentration the interpolated value may be 80%. However, if one or more of these values is unknown, the routine must determine that fact and adjust weighting factors to compensate. If the 100% value is replaced by an unknown value, the result will come out differently.

The second problem in dealing with AARI data is more difficult to resolve. This is the descriptive and symbolic nature of some of the AARI data values. For example, AARI data on ice form has non-numerical information represented by numerical values (Table 1). These data do not work with the interpolation resampling. For example, what is the interpolated value between an ice form of strips and patches and an ice form of pancake ice? For this reason, no examples of this method have been generated and it is not recommended for application to the AARI data set.

3.3e Area min/max

The area min/max concept is explained by visualizing the SIGRID grid cells on the surface of the Earth and then overlaying the EASE grid cells on top of them (Figure 12). Next, for each EASE grid cell, determine which SIGRID cells have some portion of the area overlapping the EASE cell. Finally, as with the point min/max (Section 3.3c), place the minimum of these AARI values into one EASE image and the maximum value into another. For example, in Figure 12 the EASE grid cell with SIGRID values of 60%, 80% and 100% sea ice concentration would have a value of 60% in the minimum image and 100% in the maximum image.

The advantage of the area over the point methods (i.e., dealing with purely grid cells as opposed to EASE grid cells and SIGRID points) is that all EASE grid cells are filled. Instead of establishing which SIGRID points fall in the EASE cells, the SIGRID cells that have area overlapping EASE cells are determined. Once this is established, each AARI file can be regridded to the min/max images. Similarly to the point min/max and interpolation approaches, empty cells and unknown values must be handled. However, in contrast to the interpolation method, no mathematical operations are done to combine individual SIGRID values and there is no problem with the descriptive nature of some of the AARI data.

In a sense, all data is used in the creation of the area min/max EASE images; thus, there is no data loss. However, the results are presented in two images and the previously discussed problem with spatial consistency remains.

3.3f Area weighted averages

The final proposed regridding method, area weighted averages, is the most computer intensive. This method is similar to that of area min/max, however, only one EASE image is created. The EASE grid cell values are weighted averages of the SIGRID cell values that have some area falling into the EASE cells. For example, consider Figure 12 where the first SIGRID cell has a value of 100% sea ice concentration and covers one sixteenth of the EASE cell. The second SIGRID cell also has 100% concentration and covers three sixteenths of the cell. The third has 80% ice concentration and covers three sixteenths of the EASE grid cell area. And the final SIGRID value of 60% covers the remaining nine sixteenths of area. The EASE grid cell value is computed from the sum of these concentrations, weighted by the fraction of the total EASE grid cell area that they cover (i.e., an area weighted average):

If one of the SIGRID cells covering the EASE cell is empty or unknown, the weightings are adjusted to be fractions of the area covered by the valid values rather than the total area. Thus, if the 60% ice concentration value was instead digitized as the unknown value, the calculation (with total area = 7/16 of the original total area) now becomes:

Thus, the area weighted average approach can handle situations when unknown or empty cells are present.

Unfortunately, like the interpolation approach, this method can not handle combining descriptive data values such as those for ice form and some values of stage of development.

4. Comparison of Regridding Approaches

The objective of this section is to evaluate the six regridding methods in four categories: 1) the effects on data content in going from the AARI digitized format to the EASE grid images, 2) computer resource requirements, that is, the estimated amount of time to regrid an AARI file and the required storage, 3) the consistency of results with respect to spatial representation of the data, and 4) how each method deals with descriptive data such as ice form or symbolic data representing a range of values. Based on the evaluations in this section, a recommendation is made in Section 6 as to the optimum regridding method to apply to the AARI data.

4.1 Effect on Data Content

Transforming the data from the AARI digitized format to a gridded EASE image alters the data content. The most obvious example of this is for the nearest neighbor and drop-in-box approaches where approximately 10% of the AARI data is not represented in the EASE image and approximately 18% of the AARI grid points are duplicated in the final EASE image. The data loss and redundancy are consistent for every AARI file. In other words, the same SIGRID points are left out and repeated for each file.

For the point min/max approach, the two resulting images contain all but 16 of the original SIGRID values; however, the EASE images are presented to the user with 20% of the pixels empty. This approach is unsuited for regridding the AARI data.

The interpolation method, on the other hand, has the advantage of filling all the EASE grid cells and, in doing so, uses all SIGRID values to derive the EASE values. Ignoring the problems of combining non-numerical data, the effects of regridding on the EASE image are more subtle than the three previously discussed methods. Consider a completely digitized AARI file without unknown values, the interpolated EASE-Grid image would be a smoothed representation. At sharp boundaries in the AARI data where values change greatly from one pixel to the next (e.g., from a 100% to a 20% ice concentration), the EASE interpolated values would be intermediate (e.g., a smoothed value of 60%). The effect of unknown values on the data represented in EASE is another consideration. The most simple handling of unknown values is to ignore them. Thus, if three of the four SIGRID points surrounding an EASE grid center are unknown, the EASE grid is assigned the value of the fourth point, even though that point may be the furthest away. It is possible to construct more sophisticated methods to deal with unknown data that require an increase in processing time.

The area min/max approach also uses all SIGRID values to fill EASE grid cells. By creating minimum and maximum images, this approach does not smooth the data like the interpolation approach. This approach seems to have the least problems with regard to altering data content; however, the user must deal with twice the volume of data.

The area weighted averages method has effects on data very similar to the interpolation approach.

4.2 Computer Resource Requirements

In terms of processing time, the simple methods of nearest neighbor and drop-in-box are the quickest, followed by area min/max and, finally, the area weighted averages. Currently, nearest neighbor resampling of an single AARI file (w900904.sigrid) takes approximately one minute to generate an EASE image of total ice concentration. After reviewing the regridding methods for the most likely candidates, future tests should be performed to regrid all information in a test set of AARI files.

In terms of computer storage space requirements, the area min/max method needs twice the amount of space as other methods. The storage per individual EASE-Grid image is 519,841 bytes, uncompressed. Since each AARI file contains information on more than one data type this number should be multiplied by the number of data types each file contains. For example, an AARI file may have data on total sea ice concentration, and sea ice concentration, form and stage of development for the 1st, 2nd, and 3rd thickest ice types, ten data types in total. The uncompressed storage requirement is 5,198,410 bytes. For area min/max the required space is doubled, or 10,396,820 bytes. The total storage requirement for the full data set depends on the total number of files and the number of data types per file. The total number of files is being assessed. Because the total storage volume is expected to be quite large, and because each AARI file contains data for only the polar region of the northern hemisphere, a subset of the full EASE-Grid covering the polar regions is being considered (see Section 7.3).

4.3 Consistency of Results

This section seeks to evaluate each method based on whether the same SIGRID points are used consistently to derive the EASE-Grid values. For the nearest neighbor and drop-in-box approaches this consistency is present. The area min/max and area weighted average methods, in contrast, are more difficult to evaluate. For each AARI file, the methods do consider the same SIGRID cells in determining an EASE grid cell value. However, an AARI file may not have any value or have an unknown value recorded in one or more of the SIGRID cells. This alters the calculation and this fact should be made clear to the data user. Furthermore, for the area min/max approach, it should be made clear that the value in EASE grid cells of the minimum and maximum images do not always come from the same SIGRID cell (see the discussion in Section 3.3e).

Furthermore, with the area min/max or area weighted average methods, the regridding algorithm must deal with situations when part of the EASE-Grid cells are covered by land. Should there be a cutoff at 50% such that when more than half the cell is land covered, the pixel reported in EASE is specified as land? Is this the best value for the cutoff? Or, should the decision be made to always display the ice information?

4.4 Dealing with the Descriptive and Symbolic Nature of AARI Data

This section assesses how the regridding schemes handle combining AARI data values. Specifically, how are AARI data such as ice form or ice stage of development which are descriptive rather than numerical in nature (such as an ice form of iceberg or a stage of development of old ice) combined. Also, reference is made to the many instances in which the AARI data values refer to a range of values rather than an exact value. For example, an AARI data value of 83 for ice stage of development refers to young ice of 10-30 cm thickness. Or, a data value of 05 for ice form which refers to an ice form of big floe of 500 meters to 2 kilometers across. Symbolic data of these types can cause problems for averaging or for combining data values in another way.

The simple techniques of nearest neighbor and drop-in-box do not combine SIGRID values to determine the values for EASE-Grid images; therefore, they have no problems dealing with this type of data. Likewise, the area min/max approach merely assesses the magnitude of the data values in relation to one another and does not perform any mathematical operations on the SIGRID values.

The interpolation and area weighted averages approaches do need to deal with this type of data. For example, when averaging or interpolating SIGRID data values of ice form the following data values (from Table 1) might be encountered:

How do you combine the numerical ranges for medium floe and big floe with the descriptive ice form of fast ice? More simply, if we just consider the medium floe and big floe data values what would be the intermediate interpolated or averaged data value? A simple numerical average would yield 4.5, but this does not reference an ice form in the AARI table on ice forms (Table 1). Perhaps we could consider averaging the ranges the data values refer to (i.e., the dimensions of the ice floes: 100-500m and 500m-2km) but again, the result does not fit within the AARI list of valid values for this data type.

5. Adjustments for erroneous starting longitudes in SIGRID

As demonstrated in an example of regridding an AARI file using nearest neighbor (Section 3.3a), some AARI sea ice charts were digitized using an improper initial longitude. This caused errors in the locations of SIGRID grid points at higher latitudes. The file tested in this example was for the AARI designated "western" sector. We have not looked at all "western" sector files yet to determine if the improper selection is consistent throughout. Furthermore, we have not gone through the "eastern" sector files to see if an appropriate initial longitude was consistently chosen in these data.

If there is a consistent choice of initial longitude in both "eastern" and "western" sectors and if they align the grid points exactly, then the regridding routines could be tailored to these starting points. Then, the regridding to EASE-Grid could be performed and produce consistent results. The EASE images would, however, differ from regridding of National Ice Center (NIC) or other implementations of the SIGRID format which use correct starting points. Thus, comparisons of two such data sets would be complicated (see Section 7.6).

If, however, the initial longitude changes in the AARI files, a different approach must be taken to regrid the data to the EASE-Grid. We must consider that we want the EASE image to be created in a consistent manner. So, although we could generate a regridding transfer for each initial longitude that occurs, the EASE images would be representing different SIGRID points. This spatial inconsistency in the regridding is undesirable.

The revised implementation of the SIGRID format, SIGRID-2 (Section 7.5), should address this problem. In case SIGRID-2 may still contain errors or that implementation of the format by a different group (NIC or other agency) might differ from AARI, we are working to devise a regridding scheme that gets around this problem. We have considered replacing the rectangular SIGRID image that shows the grid centers (Figure 6) in a grid with grid cell dimensions of 0.25° latitude by 0.25° longitude. Instead, we have proposed a rectangular grid that uses these grid cell dimension but shows the AARI values over the full grid cell area. For example, in the second AARI latitude region, the lon/lat ration is two; thus, the dimensions of the grid cells are 0.25° latitude by 0.50° longitude. Instead of representing the data value of this grid cell with one pixel at the center position, we represent the total grid cell area with grid cells each having dimensions of 0.25° latitude by 0.25° longitude. The centers of the smaller replicated grid cells would have to adjusted to cover the same area as the original:

where the grid spacing is 0.25°. Similarly, SIGRID cells at higher latitudes would be represented by a number of 0.25° cells equal in number to the lon/lat ratio (see Table 4) and with the center of the first grid cell at:

and the centers of the rest of the cells replicated at subsequent longitude intervals of 0.25°. The first and following 0.25° grid cells would have the same value of the SIGRID grid point repeated in them.

The 0.25° grid is essentially an oversampled version of the original. Once this oversampled grid has been established, the different regridding methods can be applied to define the transfer between the 0.25° repeated grid and the EASE grid. In contrast to the original nearest neighbor resampling, the nearest neighbor of an EASE grid center will be one of these repeated 0.25° cells. Also, the locations of the cells are independent of the initial longitude. The placement of the replicated cells never changes.

The values in the AARI files need to be placed into this grid. Although the initial longitude may have been erroneous (as seen in Figure 8), the misplaced grid cell will still cover the same amount of area but will be represented by filling an appropriate number of 0.25° grid cells to the left and right of the digitized grid center. For the test file, "w900904.sigrid," the full AARI image analogous to Figure 7 is shown in Figure 13. Similarly, a zoom in on the left side of the grid is shown in Figure 14 (this one is analogous to Figure 8). The results of regridding to EASE Figure 16) show that AARI values at improper locations at higher latitudes are successfully regridded to the EASE projection.

We are in the process of comparing nearest neighbor results using this approach to the results of the original. It is suspected that they will differ and what that difference is and how it occurs needs to be defined. This approach would work for a problematic data set in which the initial longitude changes from file to file. It would also work for comparing data from sources that use different starting longitudes.

6. Conclusions and Recommendations

This section reviews the application of the six regridding schemes to AARI data and rejects those that do not work well. Of those that perform well, two techniques are recommended for further evaluation to determine which is the optimum approach. Problems that still remain in conducting the regridding are identified. Future work plans are described in the next section, Section 7.

Of the six methods examined, the point min/max method can be rejected immediately because 20% of the EASE images created are empty cells. This artifact of the forward regridding does not create a desirable image to give the data user despite the fact that all data is regridded to the two images.

The remaining methods are evaluated in the four categories of the previous section. These evaluations are summarized in Table 9. Using either the interpolation or area weighted average methods is not recommended since they have difficulty dealing with the symbolic nature of the AARI data. Two of the remaining three options, nearest neighbor and drop-in-box, have minimal differences with regard to their resulting EASE-Grid images. Using the nearest neighbor approach is recommended since it seems to be the closest to sampling the original sea ice charts. The sea ice charts had values digitized at the SIGRID points. These values were then considered to apply to the entire grid cell even though the grid cell on the charts may have contained other values. Finally, the area min/max approach is recommended. Although more demanding in processing time and computer storage, this approach does not have the data loss problems of the nearest neighbor regridding scheme. Further evaluation of these two techniques and testing on more AARI files is recommended.

The errors in selection of initial longitudes encountered during the regridding of AARI files requires additional investigation. We should ascertain whether this remains a problem in SIGRID-2 or in a comparison of AARI data to NIC data. If the 0.25° replicated grid is required to deal with this problem, or if another solution is proposed, it should be thoroughly examined.

7. Future Work Plans

This document has detailed the work that has been completed in defining a regridding method to take AARI data in the SIGRID format and display it in the EASE-Grid northern hemisphere projection. This section outlines the additional work remaining to select the optimum regridding scheme (Section 7.1) and to produce a data set of EASE-Grid images of the AARI data (Section 7.2 and Section 7.3). Furthermore, quality control procedures needed to ensure the user receives correct and reliable data are proposed (Section 7.4). Finally, issues regarding the revision of the SIGRID format (Section 7.5), the proposal to compare AARI data with NIC and/or SSM/I data (Section 7.6), and the regridding of AARI data for the southern hemisphere (Section 7.7) are presented.

7.1 Further Evaluation of Nearest Neighbor and Area Min/Max Resampling

The remaining work for deciding between the nearest neighbor and area min/max resampling consists of:

implementing the area min/max regridding

using both methods to regrid a test set of AARI data (for instance, one year of data from the east and west sectors)

evaluating the results in terms of processing time, interpretation of images, and computer storage needs.

The bulk of time to accomplish this task is required for the first step. The method sketched out in Section 3.3e needs to be refined and coded. Evaluating the two methods in terms of the interpretation of the images will be the second most time consuming task. That is, describing what the grid-point-based nearest neighbor images represent as opposed to the two min/max images created under a grid cell paradigm.

7.2 Processing (optimization/re-coding)

In order to evaluate the different regridding schemes programs were written in IDL. While IDL has advantages for displaying and manipulating images, it can be slow for processing tasks. Therefore, it is suggested that routines for creating the EASE images from the AARI data be written in C. For generating EASE images using nearest neighbor, the C code required is trivial and the speed increase in processing the images will be big. This is especially true when considering the task of processing the entire data set. In addition, the C code could include a subroutine to subset the data as outlined in Section 7.3.

As discussed in Section 7.5, the routine that parses the AARI files is currently written in IDL and should be re-coded in PERL. Finally, as discussed in Section 7.4 some quality control procedures should be incorporated at various stages in the processing: in the PERL script that parses the data, in the C code to regrid the AARI files, and subsequently in an IDL routine to visually check the EASE-Grid images.

7.3 Subset of Full EASE-Grid

The EASE-Grid projection was established for the full hemisphere; however, we are interested in displaying sea ice information which covers the polar regions. In order to eliminate the need to store the full image when it is only partially filled with data, a subset of the EASE-Grid should be used (see Figure 16). This subset covers the area in the northern hemisphere for which sea ice is likely to appear. The subset has the corner points at 29.7127° latitude and 135° W, 135° E, 45° W, and 45° E longitude. These coordinates match the outside corners of EASE grid cells at the column,row coordinates of (180,180), (540,180), (180,540), and (540,540) respectively. The subset matches that defined for the "Pathfinder AVHRR Polar Products." An example of the subset of the AARI file "w900904.sigrid" is shown in Figure 17. The subsetted image has dimensions of 361 columns by 361 rows and requires 130,321 bytes for uncompressed storage. This is only 25% of the size of the full EASE-Grid uncompressed image.

7.4 Quality Control

The production of EASE-Grid images from the AARI data should incorporate some measure of quality control before the data are regridded, during the regridding process, and for the EASE-Grid image in order to ensure the user receives a correct and reliable data set. While the data are still in SIGRID format some summary information should be generated for internal records at NSIDC and to provide to the user. For instance, the amount of information in the data file, that is, the number of digitized points. Also needed is the distribution of that information. For example, how many points are unknown or land, and how many of the digitized points contain information on the total ice concentration, 1st, 2nd and 3rd thickest ices and their partial concentrations, stage of development and ice forms? For example, see Table 7).

During the regridding process some data regarding the transformation of data should be recorded. For instance, the amount of data loss and duplication for the nearest neighbor resampling or, for the area min/max approach, the number of pixels that were assigned to the min or max images when that value covered less than 10% of the EASE grid cell.

Finally, a visual inspection of the final EASE image should be made and, also, the generation of meta data. At some stage in the processing a simple check on data values should be made to check that all data values lie in the proper range for the data types. Also, a check should be made on sea ice values on land or land values in the ocean. Preliminary evaluation of the data shows that AARI sometimes digitized points as land and these points appear to be in the ocean after regridding. Likewise, some sea ice values appear on land.

7.5 SIGRID-2

In 1994, a revision of the SIGRID format for storing the digitized sea ice charts was finalized. This revision corrected shortcomings in the original version. The document (6) describing the new format, SIGRID-2, lists the shortcomings as:

An arbitrary origin of grid lines results in the longitudes of grid points being non-coincident.

The designation by code figures of the variables and their quantitative values leads to the need to preserve a constant number of symbols in the data group and to use frequently the figures 99 (undetermined/unknown), thus resulting in an increase of the volume of the stored and reported information.

The coding procedure at an additional subdivision of the grid areas is very complicated and at the same time does not actually decrease the volume of coded data.

The abridged format does not contain all the data in the original version. As a result, the syntax of the data files under SIGRID-2 differs from the original.

In the future, the AARI sea ice data may be compared to data from the National Ice Center (NIC) or measurements from the SSM/I or SMMR sensors. In this section are a few of the considerations for performing such comparisons.

The first complication in comparing the data is that AARI, NIC and SSM/I or SMMR differ in the temporal sampling of the sea ice. AARI files have 10 day sampling, NIC files have 7 day sampling and SSM/I has two images per day (from the NSIDC EASE-Grid product) or one daily average image (from the NSIDC polar stereographic products). Furthermore, the AARI and NIC files could either be composites or averages. In a composite, all coverage in the time period is combined by selecting the "best" data. For an average image, if there is data from several days for the same point then that data is averaged rather than selecting the "best" value. Since the AARI and NIC differ in their sampling, a 7 day NIC file may only partially overlap the 10 days in the AARI files. So perhaps a longer time average of the two data sets should be compared, for example monthly NIC and monthly AARI averages. Also, in comparing an AARI or a NIC multi-day composite/average to SSM/I averaged images, we must consider that in the SIGRID data a pixel may represent a measurement from a single day, but the SSM/I is the result of averaging 7 or 14 measurements over the course of a week (10 or 20 for a 10 day period). In areas where the ice is variable (near the ice edge) or during times when the ice is variable (during melt or growth stages) such measurements could differ greatly.

A second consideration is that of spatial sampling. AARI and NIC data are compiled from many measurement techniques all of which have differing spatial resolutions. Thus, you might be comparing AARI sea ice concentration compiled at 30 km with NIC data gathered at 10 km resolution or SSM/I at 25 km. Again, this may cause differences in situations where the ice is highly variable. Furthermore, in comparing a SIGRID data set to the SSM/I the regridding method used to create the EASE images must be considered. If nearest neighbor is used than the EASE cells are not truly showing the data gathered in that grid cell, but rather the data digitized at a point near the center of the grid cell. Likewise, the area min/max does not show values that occur in the entire cell, but rather the min or max that occur in some portion of the grid cell. Finally, the SIGRID data is composed of discrete points on the surface of the Earth and the SSM/I represent an area. Comparing point data with area measurements is always difficult. Instead, regimes for example 2° by 2° areas of the Laptev and Kara Seas should be compared, as opposed to the entire AARI coverage. This will ensure that there is consistently AARI data to compare with other data since the AARI charts focus on these regions.

7.7 Southern Hemisphere

The AARI data set has files that cover the southern as well as the northern hemisphere. The regridding methods suggested for application to the SIGRID data in the northern hemisphere are also applicable to the southern hemisphere. However, although the techniques are independent of which hemisphere is regridded, the routines written to perform the regridding are not. Thus, all routines need to be adapted to the southern hemisphere. This does not include the pre-existing routine "easeconv.pro."

Format to Provide Sea Ice Data to the World Climate Program, (SIGRID-2). Prepared by the Steering Group for the Global Digital Sea Ice Data Bank, Commission for Marine Meteorology, World Meteorological Organization. Received from AARI at NSIDC on 13 April 1994.