QGIS-SEXTANTE cookbook

martes, 9 de abril de 2013

My colleague Ilya Rosenfeld just published a few days ago a great post about Geoscript, a project that provides a wrapping of the main GeoTools elements for handling spatial data in several scripting languages, making it easier to access the power of GeoTools.

I took the example data from that blog post (irradiance data provided by the National Renewable Energy Lab) and tried to replicate part of the work using the QGIS Python console and, of course, calling SEXTANTE algorithms from it. We will be using SEXTANTE algorithms from the console, and also writing a Python script and an R script, both of them to be wrapped as new SEXTANTE algorithms.

Let's start. The first thing is to open the dataset, that you can download here

You can use the usual menus in QGIS, but we can open it directly from the QGIS Python console, with something like this

Now that there is just one polygon selected, any operation performed by another SEXTANTE algorithm will use only that polygon instead of the whole layer. We can use that to cut the original layer with Direct Normal Irrdiance (DNI) values and get a layer with only those polygons within Florida. The Cut shapes algorithm will help us with this, and we can call it using the following code

Remember that, when run in the console, SEXTANTE algorithms do not load their results into the QGIS canvas, so you have to manually load them in case you need to. The resulting layer, if loaded, should look like this.

Let's work now on the data we have in our new layer. There is no algorithm to summarize all the values corresponding to those polygons into an average one that can be assigned to the whole Florida state, but we can create it rather easily. Open the SEXTANTE script editor and enter the following code.

This is a small script that creates a new layer with one point representing the features in the layer (placed in the 'centroid of centroids' of those input features), and attributes representing the average value.

Save it as summarize.py in the scripts folder (remember that it is the defualt one when SEXTANTE show you the save file dialog). The name to refer to the algorithm is taken from the filename, in case you haven't provided one in the algorithm description (which we haven't done. We have just defined the group, not the algorihtm name), so this one will be named script:summarize

Now you can call it just double clicking on its name in the toolbox, and using the dialog that SEXTANTE will create based on your algorithm description.

However, you can also use the console to call it, and since we are mostly working on the console in this exercise, we will do it that way. Enter the following sentence to call the algorithm.

Notice how we are using the output of the previous algorithm as input for this one, with no need to load it in QGIS, and refering to it using the outSummarize dict returned when running the cutting algorithm
If we want to create a histogram, we have no tools in SEXTANTE to do so, at least not in our particular case in which the data is in a single row of the attributes table of a vector layer, and not in all its fields, but just a subset of them. However, it is not difficult to create an algorithm to do that, and we can rely on R to do it.

Open the SEXTANTE R editor and create a new algorithm with the following code.

##showplots
##layer=vector
barplot(as.matrix(layer@data[1, 4:15]))

Save it as monthly_barplot.rsx, and it will now be available to be called from all the SEXTANTE components and also from the console. Run it with the following code

Save it and you will see it in the toolbox, with the name dni barplot. This time we have added the name in the description header, so the name is not taken from the file. You can use any filename to save it

The algorithm has the name of the states attribute hardcoded, but you can edit it to make it more flexible. Take this just as an example.

The interest of this script is not to use it once to replicate the same process we have already run, but to use it repeatedly to create similar barplots for other states easily.

I case you want to produce a barplot for each state, you can automate that with a few lines of Python code:

sábado, 23 de febrero de 2013

Some of the algorithms contained in SEXTANTE do not actually do any analysis, but they just perform some common tasks such as management ones. Calling this algorithms from SEXTANTE is interesting, because you can save time by automating those task through the SEXTANTE modeler or the batch processing interface.

We will see in this entry how to bulk import layers into GeoServer using SEXTANTE, and how to add a bit of processing along the way, to ensure that the layers we import have optimal characteristics to provide maximum performance when responding to requests.

To import a single vector layer, double click on the Import vector into GeoServer

Set the parameters of the GeoServer instance that you want to import to (and make sure GeoServer is running) , the workspace where you want to import it, and the layer to import. Click OK and the layer will be imported.

The workspace must exist. If you want to create a new one, you can use the Create workspace algorithm before importing.

Importing to GeoServer from SEXTANTE is easy but, what if you have to import all the layers in a folder? That might take time. However, using the SEXTANTE batch processing interface can make the task much easier.

Go to the toolbox and, instead of double clicking on the name of the algorithm name, select the Import vector into GeoServer algorithm, right-click on it and choose the Run as batch process option. The following dialog will be shown.

Each row represents an execution of the algorithm. That is, in this case it represents one layer being imported into GeoServer. We have to fill the table, adding as many rows as needed, to perform the bulk import.

Select the Layer to import field in the top row. That correponds to the layer to import in the first execution of the algorithm. Click on the button in the field and you will see the typical file selection dialog. Browse to the folder where all the files you want to import are found, and select them all. Not just one of them, but all of them.

When you click on OK, SEXTANTE will understand that you want to use each selected file for a different run of the algorithm, and will add the rows that are needed to accomodate them.

Filling the other rows has to be done manually, but if you want to use the same values for all the rows in a given column (you will probably be importing to the same GeoServer instance and the same workspace), you can use the following trick: enter the value in the first row and then double-click in the column name. That will cause the value to be copied to all the other rows automatically.

Once the table is filled, just click on OK and the process will import all your layers into GeoServer. Open your GeoServer admin page to check that they have been imported.

As it happens with all SEXTANTE algorithms, this one is available in the GIS Python console as well. We can get some automation by calling it from a script. As an example, the following code imports all the layers that are loaded in the current QGIS project .

Let's now import raster layers instead of vector ones, but adding some extra processing before importing. What we are going to do is to make sure that the raster layer has a good format (in our case GeoTiff) in terms of performance, and a correct structure (we are going to ensure that it is internally tiled and has overviews).

The first thing to do is to export the raster layer as a GeoTiff using GDAL, and using the corresponding modifiers to add inner tiles of a size of 1024 x 1024 pixels. We do this with the GDAL Translate algorithm and the options shown in the figure below:

After that, we will run the Build pyramids algorithm, and add 4 levels of overviews, using the following configuration.

The last step will be importing into GeoServer, which is trivial if we already know how to import vector layers,since the interface of the algorithm to import raster ones is almost identical.

To make things better, we can put these three steps in a simple model, using the SEXTANTE modeler:

The URL and workspace fields have been harcoded in this case, and only the store name is shown as option when running the model.

That is the most convenient way of running this workflow, since the intermediate layer should not be loaded into QGIS but just generated (we can disable this when running the Translate algorithm, but as part of the model that is just defined once, and we do not have to worry about getting unnecesary layers in the QGIS canvas.)

Since models are just like any other algorithm, they can be run on the batch processing interface, so can do a bulk import of raster layers as well, including the preparation of the corresponding files before importing.

sábado, 2 de febrero de 2013

In order to have more layers than can be used to create meaningful examples for the processes included in the OpenGeo Suite, I took the task of extending the Medford dataset that currently ships with the Suite and is used for most of the examples and use cases. In particular, I wanted to add a DEM and a slope layer, since the dataset has several vector layers, but lacks useful raster ones. I used quite a few SEXTANTE algorithms to do this, so I have decided to write down the process and publish it here as a case study.

The first thing I did was to look for good elevation data. The National Elevation Dataset was clearly my first choice. As it is divided in tiles, metadata shapefiles are provided, with polygons that represent the extent of each tile.

Data can be dowloaded from the National Map download website, selecting the region you want to download. The process is rather cumbersome (you have to give an email address and a download link is sent to you!!), but I ended up having two raster layers that covered the Medford area, as shown below.

These layers had two problems:

They covered an area that was too large for what I wanted (I was interested in a smaller region around the Medford city center, shown in brown in the image above)

They were in two different files. (The Medford limits fall into just one single raster layer, but, as I said, I wanted some extra area around it.

Both of them are easily solvable using SEXTANTE.

First, I create a rectangle defining the area that want. To do it, I create a layer containing the bounding box of the layer with the limits of the Medford area, and then I buffer it, so as to have a raster layer that covers a bit more that the strictly necessary.

To calculate the bounding box , we can use the Polygon from layer extent algorithm

To buffer it, we have several alternatives. The GRASS v.buffer.distance algorithm has the option to leave square corners, which in this case it is interesting, since we are then going to use the resulting polygon to crop a raster layer, which has to be rectangular.

Here is the resulting bounding box obtained using the parameters shown above

With this layer that contains the bounding box of the raster layer I want to obtain, I can crop both of the raster layer that I downloaded, using the Clip Grid with Polygons algorithm.

Before clipping, the clipping vector layer and the raster layer to clip must be in the same CRS, since the algorithm assumes that. Originaly, they are not (the bounding box has the same CRS as the Medford limits layer, which is not the same as the downloaded DEM layers), although they appear to match, since QGIS is performing on-the-fly reprojecting before rendering, and also because both CRSs (EPSG 4326 and EPSG 4269) are rather "similar", so even without the on-the-fly reprojection everything would appear more or less "in place".

Reprojecting the clipping polygon can be done with the Reproject layer algorithm.

Now we can crop the raster layers.

Once the layers have been cropped, they can be merged using the Merge Raster Layers algorithm

A cellsize is needed for the merged layer. I used the same one of the input ones. You do not need to know it in advance before calling the algorithm. Just click on the button in the right-hand size of the text field and you will have a dialog to enter small mathematical formulas, and a list of frequently used values, among them the cellsizes and bounding coordinates of all available layers.

Note: You can save time merging first and then cropping, and you will avoid calling the Clip... algorith twice. However, if there are several layers to merge and they have a rather big size, you will end up with a large layer than it can later be difficult to process.

With that, we get the final DEM we want.

A slope layer can be computed with the Slope,Aspect,Curvature algorithm, but the DEM obtained in the last step is not suitable as input, since elevation values are in meters but cellsize is not expressed in meters (the layer uses a CRS with geographic coordinates). A new reprojection is needed. To reproject a raster layer, the GDAL Warp algorithm can be used. We reproject into a CRS with meters as units, so we can then correctly calculate the slope.

Here is the reprojected DEM.

With the new DEM, slope can now be computed.

The slope produced by the SAGA Slope,Aspect,Curvature algorithm is expressed in radians, but degrees are a more practical and common unit. The Metric conversions algorithm will help us to do the conversion.

A similar conversion can be made using the raster calculator (whether the SAGA one or the GRASS one).
Converting to a different unit, such as percentage, can also be done with the calculator.

Reprojecting the converted slope layer back with the GDAL Warp algorithm, we get the final layer we wanted.

The reprojection processes have caused the final layer to contain data outside the bounding box we calculated in one of the first steps. This can be solved by clipping it again.

All providers included in SEXTANTE can use and produce all data types supported by QGIS, even if the actual application doing the work does not support all those formats. SEXTANTE takes care of all the needed conversions, so the user doesn't have to worry about it.

This full support of many different formats can be used automate the conversion of a set of files into a given format. We will see in this post how to do it.

There are several modules specifically designed to convert/translate between formats. The GDAL and OGR providers contain modules for converting vector (ogr2ogr) and raster layer (translate) respectively. In the case of vector layers, there is also a quick trick that we can use, which does not require use any external application and does all the work directly within QGIS. The Create new layer with selected features algorithm outputs a new layer with just those features that are selected. However, if no feature is selected, it will take all of them (this is the default behaviour of SEXTANTE in case there is no selection, and it works like that for all providers), so saving a layer with no selection will create that same layer. We just have to use a different file extension, and we will get the same layer, but saved in a different file format.

The algorithm dialog is rather easy to understand, as it doesn't have much more that an input field and an output field.

Just select the layer to export and the destination file. The extension of that file will define the output file format.

This way of exporting a layer through SEXTANTE instead of using the "normal" QGIS elements (in this case the Save Selection as Vector File menu) does not make a great difference for a single layer, but in the case of having a large number of them, it will ease out work considerably. To profit from the productivity tools that are part SEXTANTE, we should run the algorithm not like we usually do, double clicking on its name, but using the batch processing interface. Right-click on the name of the algorithm and then select Execute as batch process. You will see the following dialog.

Each row represents an execution of the algorithm, so we have to add as many rows as layers we want to convert and then fill the table. It is even easier than that, because we can select a table cell in the Input layer column, click on the edit button to bring up the file selection dialog and select a bunch of files, and SEXTANTE will take care of creating the needed rows to accomodate all the selected files.

The next thing to do is to fill the output names. Click on the upper cell in the Output layer... column and browse to the folder where you want your resulting files to be stored. We are going to convert all our files into GML files, so enter ".gml" as the destination filename (notice that the first character is a dot). Don't worry about that name, because we are going to let SEXTANTE fill the names for us. The only important thing is the extension. Click on OK and you will see a dialog like this.

Now select the values as they are shown in the above figure. This tells SEXTANTE to fill not only the cell you have selected, but all the ones underneath, and adding in each case the filename in the Input layer field as suffix. Since the we entered en empty filename (only a file extension), the resulting names will be the same names as the input layer, but with a different extension.

The final table is ready to be executed.

Now click on OK and SEXTANTE will convert all your files.

Files generated by the batch processing interface are not loaded into QGIS, but you can go to the folder you entered to check that they have been correctly exported.

jueves, 31 de enero de 2013

This post is, once again, inspired by a question by a QGIS user in the QGIS mailing list. The problem is as follows:

Given a DEM and a polygon defining an area to analize, divide the area inside the polygon in subareas of a given surface S. This subareas will be used for defining a harvesting scheme, and since the risk of erosion is to be minimized, they should run along contours. In other word, the problem is similar to creating elevation classes, but instead of being defined by an elevation interval, the interval is based on the total area between class breaks.

This is an interesting problem, because it can be solved in many different ways, and using many different tools from the SEXTANTE toolbox. Here is just one possible solution, which I consider particularly elegant.

I will be using the same DEM used for other previous post, along with a simple polygon created on top of it.

First we clip the DEM with the polygon that represents the area to divide, using the SAGA Clip grid with polygons algorithm

Now we run the SAGA Sort grid to get a grid that has values ranging from 1 to n, n being the total number of cells (excluding no-data ones) in the input layer. The cell with the lowest value in the input layer has a value in the output layer of 1, the cell with the second lowest value gets a value of 2, and so on.

The resulting layer looks very similar to the original DEM. How similar it looks depend on how homogeneously distributed the original values are.

We can now reclassify this layer to get the desired zones. If we want a total area of , let's say, 1000ha (10000000 m2) for each zone, it is easy to calculate the number of cells in each zone, just dividing the zone area by the area of a size cell (the DEM that I used has a cellsize of 25 * 25 = 625m2). In our case, numcells = 10000000 / 625 = 16000.

Note: I have selected a large area for each zone, to get a result where they can be easily visualized, but in this context a much smaller are would be more logical.

We need group of 1600 cells, taken using the ordering defined by the sorted grid we have produced. The most straightforward way to do it is to use a reclassify algorithm. SAGA has one, and you need a table with the values defining each range and the resulting value. In our case, it might take a while to fill it, since there is a large number of group. GRASS has another algorithm, and the ranges are defined in a text file, which might help us, since we can create it more easily using a spreadsheet.

However, let's try to do it without reclassifying, looking for an easier way. Since our classes are regular, we can define them using a mathematical expression. Open the SAGA Grid calculator algorithm,, select the sorted layer as the only input grid (which we will refer to in the formula as a), and use the following formula: int( (a-1) / 117298 * 8)

Et voilà, this will directly produce the layer we are looking for.

How did that work? Let's have a closer look at the formula.

First, we normalize the layer to have values only in the (0,1) range, by substracting the min value (1) and dividing by the total range covered (117298). Then we multiply by the number of classes we want (8, which was obtained by dividing the total number of cells, 117298, by the size of each class in cells, 16000, and rounding it up), to get values in the range (0,8). Then we apply the int() function to truncate those values and have just integer class values.

How can we now that each class has the same number of pixels as the other ones? Easy. Just go to the properties of the resulting raster layer and compute a histogram. You will see that the histogram has the same constant y value at all integer x values.

If you are familiar with image analysis, you will easily recognize here a case of image equalization.

Notice that the class index are zero-based (and up to 7, not 8), since we used truncation.

The resulting layer looks similar to the one that would be obtained by just reclassiying the DEM instead of the ordered layer, but the result might differ depending on how the layer values are distributed.

lunes, 28 de enero de 2013

Among the long list of algorithms in SEXTANTE, selection algorithms are a very interesting group. Instead of producing new layers, they just modify the selection that exist in one of them. They cannot be used from the batch processing interface (since it works with layers that are not loaded in QGIS, so there is no selection at all), and using them from the SEXTANTE toolbox is not such an useful tool, but in the modeler and the Python console they can really help us simplify things and automate tasks.

This short post shows how to use them to automate a task that would otherwise involve a large amount of work, but that can be solved this way with just a few lines. It is inspired by this question recently asked at StackOverflow

The problem is as follows: given a points layer (species) with the presence of different species, and a polygon layer(zones) with vegetation types, generate N new polygon layer, each one with the areas where each specie can be found.

Assuming that the specie represented at each point is stored in an attribute named specie, the following script will solve out problem

Just two selections, and then we save the resulting selected features to a file whose name is created from the name of the corresponding specie.

This should work on the just-released 1.0.9 version of SEXTANTE. Just make sure you do not enable executing in a separate thread, since it seems to still cause some strange errors when using the console, and it tends to freeze QGIS when a runalg command is put in a loop.

martes, 8 de enero de 2013

To start the year, here is the first external contribution to this blog, a great post about the TauDEM algorithm provider for hydrological analysis. Alexander Bruy, an active QGIS developer, and most likely the most active SEXTANTE developer apart from myself, is responsible for this great entry that I am glad to share here.

Thanks, Alex!

-------

TauDEM (Terrain Analysis Using Digital Elevation Models) is a set of Digital Elevation Model (DEM) tools for the extraction and analysis of hydrologic information from topography as represented by a DEM. This is software developed at Utah State University (USU) for hydrologic digital elevation model analysis and watershed delineation.

TauDEM is distributed as a set of standalone command line executable programs for a Windows and source code for compiling and use on other systems. SEXTANTE contains only the interface description, so you need to install TauDEM 5.0.6 by yourself and configure SEXTANTE properly.

TauDEM extracts hydrologically useful information from raw digital elevation model data. The main idea is based on the concept of the hydrological flow field being represented by flow from each grid cell to one or more of its neighbors. For this to work the topography should not contain pits, defined as one or more grid cells surrounded completely by cells with higher elevation. The first step in hydrological analysis is should be pit removal. TauDEM uses a fill process to do this, raising the elevation of pits until they drain out. A DEM with pits removed is referred to as hydrologically correct and can be used to calculate flow directions for each grid cell.

So the first tool to run is Pit Remove. Select the DEM in the Elevation Grid field and execute it.

The output DEM is the same, because our input file has no pits. If you are absolutely sure that your input file has no pits, this step can be skipped, otherwise it is safer to run Pit Remove to ensure that all further analysis will be correct.

The next algorithm to run is D8 Flow Direction.

This takes as input the hydrologically correct elevation grid (which we create at the previous step) and outputs D8 flow direction and slope for each grid cell. The resulting D8 flow direction grid is shown below.

It uses an encoding of the direction of steepest descent from each grid cell using the numbers 1 to 8, to represent each one of the cells around a given one. D8 is the simplest flow direction model.

NOTE: all TauDEM algorithms have built-in help, so you always can read description of the algorithm, required inputs and produced results.

The next tool to run is D8 Contributing Area. It counts the number of grid cells draining through (out of) each grid cell based on D8 flow directions.

There are optional inputs (outlets and an input weight grid) which we leave empty for now. These inputs are described in the tool help and allow to restrict calculation to the specific area upstream of the designated outlet and to accumulate an input weight field, rather than just counting contributing area as a number of grid cells. There is also an option named Check for edge contamination (set to Yes by default). The resulting layer produced by this algorithms looks like this:

A logarithmic scale is better suited to the particular distribution of values in this layer

In the rendering below, pink color is used to show NODATA values, so as to illustrate edge contamination.The functions above used the D8 flow model that represents flow from each grid cell to one neighbor. TauDEM also provides the D∞ (D-Infinity) flow model that calculates the steepest outwards flow direction using triangular facets centered on each grid cell and distributes flow between neighboring grid cells based on flow direction angles.

Having flow directions and slope grids we can delineate and analyze stream networks and watersheds. The simplest stream network delineation method uses a threshold on contributing area.

First we need to define a stream using the Stream Definition by Threshold algorithm.

This tool defines a stream raster grid by applying a threshold to the input. In our case the input is a D8 contributing area grid and a threshold of 100 grid cells has been used.

The result depicts the stream network as a binary grid (but is not logically connected as a network shapefile yet).

The next step is to define the outlet point. This can be done by creating a point shapefile and placing the outlet point. Press the New Shapefile Layer button in the QGIS main window, and the following dialog will appear.

Select Point as shape type, and specify the same CRS as the DEM. TauDEM requires this, because it doesn’t do any spatial reference transformations. Now press OK and save the new file wherever you prefer. A new vector layer will be added to QGIS canvas. Select it and start editing. Zoom to a likely location for an outlet and create a new feature, then save and stop editing.

It is not required for the outlet to be precisely located where there will be a stream as TauDEM has a tool to move outlets to streams that we will used. Alternatively, if the outlet point location is available from some other source (e.g. location information about a stream gauge), it can be created from that information.

Here is the outlet we created manually.

As you can see, it is located outside the stream. So we use the Move Outlets To Stream tool to get outlet in correct place.

IMPORTANT: TauDEM creates shapefile in same CRS as input files, but it doesn’t create .prj file. You need to define projection for output file manually in order to get it in correct place. This can be done from layer context menu using Set Layer CRS item.

Here is our initial outlet (red) and correct one (green). Notice how the outlet has been moved to match the stream.

With the outlet positioned on the stream the stream network upstream of the outlet can be delineated.

We again run the D8 Contributing Area algorithm, but this time specifying an outlet shapefile to evaluate contributing area and effectively identify the watershed upstream of the outlet point (or points for multiple outlets).

The result is the contributing area only for the watershed upstream of the outlet.

The next step is to use the Stream Definition By Threshold algorithm to define streams using a specified contributing area threshold. We choose a threshold of 200 grid cells.

The result is a grid stream network upstream of the outlet

This network is still only represented as a grid. To convert it into vector elements represented using a shapefile, the Stream Reach and Watershed algorithm is used.

This algorithm produces many outputs including watershed grid and stream network in shape format and some text files with additional information. Here we visualize the watershed grid and stream network shapefile.

The attributes of a stream network link contains useful information such as downstream and upstream connectivity, stream order, length, slope, drainage area etc (detailed explanation of fields is available in the tool help).

Rather than do each step separately, we can create model with SEXTANTE modeler to run these algorithms together.