Tips and tutorials for GIS and Remote Sensing…

Tag Archives: Batch Processing

Previous posts have covered atmospheric correction using the Atmospheric and Radiometric Correction of Satellite Imagery (ARCSI) software. When a large number of scenes require processing then commands to automate steps are desirable and greatly simplify the process.

ARCSI provides a number of commands which are useful for batch processing:

arcsisortlandsat.py sorts the landsat ‘tar.gz’ files into individual directories for each sensor (i.e., LS 5 TM) and also builds a standard directory structure for processing.

arcsiextractdata.py extracts the contents of \verb|tar| and \verb|tar.gz| archives with each archive being extracted into an individual directory.

arcsibuildcmdslist.py builds the ‘arcsi.py’ commands for each scene creating a shell script.

The files used for this tutorial are shown below, but others could be downloaded from from earthexplorer and used instead.

Build ARCSI Commands

To build the ‘arcsi.py’ commands, one for each input file, the following commands are used. Notice that these are very similar to the individual commands that you previously executed but now provide inputs to the ‘arcsibuildcmdslist.py’ command which selects a number of input files and generate a single shell script output.

Following the execution of these commands the following two files will have been created ‘LS5ARCSICmds.sh’ and ‘LS8ARCSICmds.sh’. These files contain the ‘arcsi.py’ commands to be executed. Open the files and take a look, you will notice that all the file paths have been convert to absolute paths which means the file can be executed from anywhere on the system as long as the input files are not moved.

Execute ARCSI Commands

To execute the ‘arcsi.py’ commands the easiest methods is to run each in turn using the following command:

sh LS5ARCSICmds.sh
sh LS8ARCSICmds.sh

This will run each of the commands sequentially. However, most computers now have multiple processing cores and to take advantage of those cores we can use the GNU parallel command line tool. Taking advantage of those cores means that processing can be completed much quicker and more efficiently.

cat LS5ARCSICmds.sh LS8ARCSICmds.sh | parallel -j 2

The switch ‘-j 2’ specifies the number of processing cores which can be used for this processing. If no switches are provided then all the cores will be used. As some parts of the process involve reading / writing large files, depending on the speed of you hard drive and the number of cores you have available you might find limiting the number of cores is needed. Please note that until all processing has completed nothing will be printed to the console. You can check the command are running using the top / htop command.

Once you have completed your processing you should clean up your system to remove any files you don’t need for any later processing steps. In most cases you will only need to keep the ‘tar.gz’ (so you can reprocess the RAW data at a later date if required) and the SREF product. It is recommended that you also retain the scripts you used for processing and a record of the commands you used for a) reference if you need to rerun the data and b) as documentation for the datasets so you know what parameters and options were used.

This post was adapted from notes for a module taught as part of the Masters course in GIS and Remote Sensing at Aberystwyth University, you can read more about the program here.