Tips and tutorials for GIS and Remote Sensing…

Tag Archives: Data Handling

For working with ENVI files I normally use GDAL as code can then be applied to different formats. However, there are a couple of limitations with GDAL when working with hyperspectral data in ENVI format:

GDAL doesn’t copy every item from the header file to a new header file if they don’t fit in with the GDAL data model. Examples are FWHM and comments. Sometimes extra attributes are copied to the aux.xml file GDAL creates, these files aren’t read by ENVI or other programs based on IDL (e.g., ATCOR).

For data stored Band Interleaved by Line (BIL) rather than Band Sequential (BSQ) reading and writing a band at a time is inefficient as it is necessary to keep jumping around the file

To overcome these issues NERC-ARF-DAN use their own Python functions for reading / writing header files and loading BIL files a line at a time. These functions have been tidied up and released through the NERC-ARF Tools repository on GitHub (https://github.com/pmlrsg/arsf_tools). The functions depend on NumPy.

Unzip and within a ‘Command Prompt’ or ‘Terminal’ window navigate to the the folder using (for example):

cd Downloads\arsf_tools-master

Install the tools and library by typing:

python setup.py install

Note, if you are using Linux you can install the arsf_binary reader from https://github.com/arsf/arsf_binaryreader which is written in C++. The ‘arsf_envi_reader’ module will import this if available as it is faster than the standard NumPy BIL reader.

If you are a UK based researcher with access to the JASMIN system the library is already installed and can be loaded using:

module load contrib/arsf/arsf_tools

An simple example of reading each line of a file, adding 1 to every band and writing back out again is:

For developing algorithms using spatial data, in particular multiple files it is recommended to convert the files to another band-sequential format using the ‘gdal_translate’ so they can be read using programs which use GDAL, for example RIOS or RSGISLib.

To convert files from ENVI BIL to ENVI BSQ and copy all header attributes the ‘envi_header’ module can be used after converting the interleave with GDAL. For example:

When downloading a lot of large files (e.g., remote sensing data), it is difficult to do using a browser as you don’t want to set all the files downloading at once and sitting round waiting for one download to finish and another to start is tedious. Also you might want to download files to somewhere other than your ‘Downloads’ folder.

The script uses PycURL, which is available to install through conda using:

conda install pycurl

It takes a text file with a list of files to download as input, if you were downloading files using a browser you can right click on the line and select ‘Copy Link Location’ (the exact text will vary depending on the browser you are using) instead of downloading the file. Some files follow a logical pattern so you could use this to get a couple of links and then just copy and paste changing as required (e.g., the number of the file, location, date etc.,).

Any downloads which fail will be added to the file specified by the ‘–failslist’ argument.

By default the script will check if a file has already been downloaded and won’t download it again, you can skip this check using ‘–nofilecheck’. It is also possible to set time to pause between downloads with ‘–pause’, to avoid rejection from a server when doing big downloads. For all the available options run:

python CURLDownloadFileList.py --help

As it is running from the command line, you can set it running on one machine (e.g., a desktop at the office) and check on the progress remotely (e.g., from your laptop at home) using ssh. So the script keeps running when you you close the session you can run within GNU Screen, which is installed by default on OS X. To start it type:

screen

then type ctrl+a ctrl+d to detach the session. You can reattach using:

screen -R

to check the progress of your downloads.
Alternatively you can use tmux. However this isn’t available by default.

This post was going to be called ‘faffing around with data’, which might summarise many peoples feelings towards dealing with data that isn’t in the format they need it to be in. An earlier post described adding an ENVI header to binary files so they could be read into GDAL. This deals post details using Python to read tabular data stored in a text file, although many of the functions would apply to extracting any data from a text file.

Built in functions

Before setting out it’s a good idea to confirm the data can’t be read with any of the functions available in Python. If it opens in Excel, it’s likely these will work and they may be an easier option (no point reinventing the wheel!)

The CSV module in the standard Python library is really easy to use and can handle various delimiters:

Depending on how many files you have to process, a bit of clever use of find and replace in a text editor or using the sed command might allow you to get your data into a format that can be read by the built in functions. If this does end up being too messy it will at least provide some insight into the steps needed to write a script to read the data.

Writing a Custom Reader

The basic structure of a custom reader is to iterate through each line in the file and split it into columns.

Where ‘\s+’ is a regular expression for one or more spaces. When choosing the character to use as a replacement, make sure it’s not used anywhere else in the file.

Time Data

Time can be stored in a range of different formats. Python offers ways of storing and manipulating time and data data as part of the time module. One of the easiest ways of getting time from a text file and into the Python representation is to use the ‘struct_time’ function. If each component of the date (year, month etc.,) is stored as a separate column you can use the following:

Sometimes it’s necessary to join two CSV files together using a common field. I wrote a python script a while ago to perform this task using the CSV python library (JoinTablesCSV.py). The script is run from the command line and takes the input and output CSV files and the numbers of the column to match (starting at 0).

For example to join attributes from inMatchFileName.csv to inRefFileName.csv, by matching the first column you can use:

Note this assumes there is only one record for each ID, if there are multiple (as in a relational database) this simple aproach won’t work and actually loading the data into a relational database would be better. My recomendation for this is SQLite, a self-contained database (i.e., stored in a single file and doesn’t require setting up a server), that can be accessed from Python and R.

Quite often remote sensing data is supplied as a binary file with the dimensions and geographical information in a separate text file. If you can add the relevant information into an ENVI format header file you can easily read the data using GDAL. If you have a lot of files in this format it’s not too tricky to make a python script which will pull information from the supplied text file and populate the relevant fields in an ENVI header file.

If you have ENVI the easiest thing to do is open one of the files, fill in the parameters in the dialogue and then use the header file (‘*.hdr’) generated as a template. This might require playing around with parameters (data types, byte order etc.,) if not all the information you need is with your data. Make sure the data displays OK and it’s in the correct geographic location.

If you don’t have ENVI you can manually generate the file using template below as a guide or if you have another file which is similar (same projection etc.,) translate to ENVI format using gdal:

gdal_translate -of ENVI infile.kea outfile.env

Then use the header file you have just created as a template. It needs to have the same filename as the data with the extension hdr . A list of the data types the ‘data type’ number in the header corresponds to is available on the IDL website. You may need to experiment with the parameters a little, you can use TuiView to open the data and check everything looks OK.

Once you have a template you write a python script, similar to that below, to create a header replacing the parameters (size, upper left coordinate) as needed.

To create header files for all data in directory you need to loop through the data, and read parameters from the corresponding text file. There are lots of string functions build into python, the split function is particularly handy. Assuming a text file of the form: