WCSTImport Guide

Description

WCSTImport is a utility application in the rasdaman software suite that allows importing of georeferenced datasets into a WCS service supporting the Transaction Extension (see wiki:WCSTGuide). Its primary functionality is allowing the ingestion of archives of georeferenced files.
This utility introduces two concepts:

Recipe - A recipe is a class implementing the BaseRecipe that based on a set of parameters (ingredients) can import a set of files into WCS forming a well defined structure (image, regular timeseries, irregular timeseries etc)

Ingredients - An ingredients file is a json file containing a set of parameters that define how the recipe should behave (e.g. the WCS endpoint, the CRS resolver etc are all ingredients)

Dependencies

The glob2, dateutil and lxml python packages are required by wcst_import, on Debian the following commands should set them up:

Recipes

NOTE: from rasdaman version 9.5, in all recipes (General Recipe, Mosaic Map, Regular Timeseries, Irregular Timeseries) the "crs_resolver" in ingredient files is not used anymore.

This value is fetched from secore_uirls= configuration in your petascope.properties file when you installed rasdaman (the file path to petascope.properties is configured in your wcst_import.sh file).

Then, it will be consistent of using SECORE (crs_resolver) default setting for both WCST_Import and Petascope.

For each one of them there is an ingredients file under the ingredients folder, which contain an example of what parameters are available. Below, you can find a description of each ingredient file:

REGULAR Tiling

You can set arbitrary tile sizes for this options in ingredient.json only if tile name is ALIGNED. If you want to take the advantage from the tile index (using: tile name: REGULAR, please see about tiling here: ​http://rasdaman.org/wiki/Tiling), you must set the tile size correctly as the divisors of total points for each axis.

Example:

A coverage which has axes: Lat: 4320 points and Long: 8640 points.
Then the tiling should be divisors of total points respectively
(e.g: "tiling": "REGULAR [0:0, 0:431, 0:863]").
(432 points and 864 points for 1 tile).

Well suited for importing a tiled map, not necessarily continuous, it will place all input files given under a single coverage and deal with their position in space. Parameters are explained below

(please note that the comments syntax using "//comment explaining things" is not valid json so remove them if you copy the parameters)

{"config":{//The endpoint of the WCS service with the WCST extension enabled
"service_url":"http://localhost:8080/rasdaman/ows",//A directory where to store the intermediate results
"tmp_directory":"/tmp/",//A default 2D crs to be used when the given files do not have one
"default_crs":"http://opengis.net/def/OGC/0/Index2D",//If set to true, it will print the WCST requests and will not execute them. To actually execute them set it to false
"mock":true,//If set to true, the process will not require any user confirmation, use with care, useful for production environments when deployment is automated
"automated":false},"input":{//The name of the coverage, if the coverage already exists, we will update it with the new files
"coverage_id":"MyCoverage","paths":[//Any normal full (or relative to the ingredients file) path or regex that would work with the ls command. You can add as many as you wish, separated by commas
"/var/data/*"]},"recipe":{//The name of the recipe
"name":"map_mosaic","options":{//The tiling that you want to be done in rasdaman
"tiling":"ALIGNED [0:500, 0:500]"}}}

Well suited for importing multiple 2-D slices created at regular intervals of time (e.g sensor data, satelite imagery etc) as 3-D cube with the third axis being a temporal one. Parameters are explained below

(please note that the comments syntax using "//comment explaining things" is not valid json so remove them if you copy the parameters):

{"config":{//The endpoint of the WCS service with the WCST extension enabled
"service_url":"http://localhost:8080/rasdaman/ows",//A directory where to store the intermediate results
"tmp_directory":"/tmp/",//A default 2D crs to be used when the given files do not have one
"default_crs":"http://kahlua.eecs.jacobs-university.de:8080/def/OGC/0/Index2D",//If set to true, it will print the WCST requests and will not execute them. To actually execute them set it to false
"mock":true,//If set to true, the process will not require any user confirmation, use with care, useful for production environments when deployment is automated
"automated":false},"input":{//The name of the coverage, if the coverage already exists, we will update it with the new files
"coverage_id":"MyCoverage","paths":[//Any normal full (or relative to the ingredients file) path or regex that would work with the ls command. You can add as many as you wish, separated by commas
"/var/data/*"]},"recipe":{//The name of the recipe
"name":"time_series_regular","options":{//The starting date for the first slice
"time_start":"2012-12-02T20:12:02",//The format of the time provided above, auto will try to guess it, other wise use any combination of YYYY:MM:DD HH:mm:ss
"time_format":"auto",//The crs to be used for the time axis
"time_crs":"http://kahlua.eecs.jacobs-university.de:8080/def/crs/OGC/0/AnsiDate",//The distance between each slice in time, granularity seconds to days
"time_step":"2 days 10 minutes 3 seconds",//The tiling that should be used for it
"tiling":"ALIGNED [0:1000, 0:1000, 0:2]"}}}}}

Well suited for importing multiple 2-D slices created at irregular intervals of time that are known at import time into a 3-D cube with the third axis being a temporal one. Parameters are explained below

(please note that the comments syntax using "//comment explaining things" is not valid json so remove them if you copy the parameters):

NOTE: Irregular timeseries has 2 types of time parameter in "options", please choose 1 of 2 which is best for your case.

{"config":{//The endpoint of the WCS service with the WCST extension enabled
"service_url":"http://localhost:8080/rasdaman/ows",//A directory where to store the intermediate results
"tmp_directory":"/tmp/",//A default 2D crs to be used when the given files do not have one
"default_crs":"http://opengis.net/def/OGC/0/Index2D",//If set to true, it will print the WCST requests and will not execute them. To actually execute them set it to false
"mock":true,//If set to true, the process will not require any user confirmation, use with care, useful for production environments when deployment is automated
"automated":false},"input":{//The name of the coverage, if the coverage already exists, we will update it with the new files
"coverage_id":"MyCoverage","paths":[//Any normal full (or relative to the ingredients file) path or regex that would work with the ls command. You can add as many as you wish, separated by commas
"/var/data/*"]},"recipe":{//The name of the recipe
"name":"time_series_irregular","options":{//Information about the time parameter, two option possible, choose either of them
"time_parameter":{//Get the date for the slice from a tag that can be read by GDAL
"metadata_tag":{//The name of such a tag
"tag_name":"TIFFTAG_DATETIME"},//The format of the datetime value in the tag
"datetime_format":"YYYY:MM:DD HH:mm:ss"},"time_parameter":{//Another option to extract the time. Use only one of the two!
"filename":{//The regex has to contain groups of tokens, separated by parentheses. The group parameter specifies which regex group to use for retrieving the time value
"regex":"(.*)_(.*)_(.+?)_(.*)","group":"2"},}//The crs of the time axis
"time_crs":"http://opengis.net/def/crs/OGC/0/AnsiDate",//The tiling to be used
"tiling":"ALIGNED [0:10, 0:1000, 0:500]"}}}

All possible ingredients

{"__comment__":["Each possible parameter for every recipe is commented in this file. As JSON does not support comments above","each filed, a __comment__ field is placed that explains the semantics of the field below it.","In some cases, a parameter might have different possible values (e.g. recipe). In this case, the field for the","parameter will be doubled.","This file is considered a developer documentation that gives an overview over all possible ingredients.","Refer to the user documentation at http://rasdaman.org or to the individual files for more documentation."],"config":{"__comment__":"The base url to the WCST service, i.e. not including ?service=WCS&acceptversion=2.0.0","service_url":"http://localhost:8080/rasdaman/ows","__comment__":"Temporary directory in which to create gml and data files, should be readable and writable by both rasdaman, petascope and current user","tmp_directory":"/tmp/","__comment__":"The default crs to be used for gdal files that do not have it","default_crs":"http://opengis.net/def/def/crs/OGC/0/Index2D","__comment__":"[OPTIONAL] If mock parameter is true then the wcst requests are printed to stdout and not executed","mock":false,"__comment__":"[OPTIONAL] Set to true if no human input should be requested and everything should be completely automated","automated":false,"__comment__":"[OPTIONAL] This parameter adds default null values for bands that *DO NOT* have a null value provided by the file itself. The value for this parameter should be an array containing the desired null value in rasdaman format for each band. E.g. for a coverage with 3 bands:","default_null_values":["9995:9999","-9, -10, -87","4"],"__comment__":"[OPTIONAL] In case the files are exposed via a web-server and not locally, you can add the root url here, otherwise the default is listed below","url_root":"file://","__comment__":"[OPTIONAL] In some cases the resolution is small enough to affect the precision of the transformation from domain coordinates to grid coordinates. To allow for corrections that will make the import possible, set this parameter to true.","subset_correction":false,"__comment__":"[OPTIONAL] If set to true, it will skip files that were not imported and move to the next ones.","skip":false,"__comment__":"[OPTIONAL] If a WCST request fails it will be retried a number of times before an error is thrown","retry":true,"__comment__":"[OPTIONAL] Number of retries to be attempted.","retries":5,"__comment__":"[OPTIONAL] The number of seconds to wait before retrying after an error. You can also specify a floating number to represent subdivisions of seconds.","retry_sleep":1,"__comment__":"[OPTIONAL] Limit the slices that are imported to the ones that fit in the bounding box below. Each subset in the bounding box should be of form {low:0,high:100} in the format of the axis.","slice_restriction":[{"low":0,"high":36000},{"low":0,"high":18000},{"low":"2012-02-09","high":"2012-12-09T14:20","type":"date"}],"__comment__":"[OPTIONAL] The directory in which to store the resumer file. By default, it will be stored in the same folder as the ingredients file.""resumer_dir_path":"/var/geodata/resumer/","__comment__":"[OPTIONAL] The number of slices to show in the description.","description_max_no_slices":42,"__comment__":"[OPTIONAL] Allow files to be tracked in order to not reimport files that were already ingested","track_files":true},"input":{"__comment__":"The id of the coverage. If it already exists, we will consider this operation an update","coverage_id":"MyCoverage","__comment__":"The input paths to take into consideration. A path can be a single file or a unix file regex.","paths":["/var/data/test_1.tif","/var/data/dir/*"]},"recipe":{"__comment__":"The recipe name","name":"map_mosaic","__comment__":"A list of options required by the recipe","options":{"__comment__":"[OPTIONAL]The tiling of the coverage in rasql format","tiling":"ALIGNED [0:500, 0:500]","__comment__":"[OPTIONAL] If you want to import in wms as well set this variable to true","wms_import":true,"__comment__":"[OPTIONAL] Specify the names of the bands, in cases the automatic inference (default: field_1, ...) is not good enough","band_names":["red","green","blue"]}},"recipe":{"__comment__":"This recipe should be used to extract a large coverage from an existing WCS service","name":"wcs_extract","options":{"__comment__":"The coverage to be imported","coverage_id":"SomeOtherCoverage","__comment__":"The endpoint of the WCS where the coverage resides","wcs_endpoint":"http://example.org/rasdaman/ows","__comment__":"A partitioning scheme to be used. For each grid axis specify the maximum number of pixels that should be retrieved. The system uses this as a hint and can generate different partitioning schemes depending on the coverage structure","partitioning_scheme":[4000,4000,1],"__comment__":"[OPTIONAL]The tiling of the coverage in rasql format","tiling":"ALIGNED [0:4000, 0:4000, 4]","__comment__":"[OPTIONAL] If you want to import in wms as well set this variable to true","wms_import":true}},"recipe":{"__comment__":"The recipe name","name":"time_series_regular","__comment__":"A list of options required by the recipe","options":{"__comment__":"The origin of the timeseries","time_start":"2012-12-02T20:12:02","__comment__":"The datetime format of the parameter above. Auto will try to guess it, any other datetime format is accepted","time_format":"auto","__comment__":"The time crs to be used with the 2d crs to create a compound crs for the whole coverage","time_crs":"http://192.168.0.103:8080/def/crs/OGC/0/AnsiDate","__comment__":"The time step between two slices, expressed in days, hours, minutes and seconds","time_step":"2 days 10 minutes 3 seconds","__comment__":"[OPTIONAL]The tiling of the coverage in rasql format","tiling":"ALIGNED [0:1000, 0:1000, 0:2]","__comment__":"[OPTIONAL] Specify the names of the bands, in cases the automatic inference (default: field_1, ...) is not good enough","band_names":["red","green","blue"]}},"recipe":{"__comment__":"The recipe name","name":"time_series_irregular","__comment__":"A list of options required by the recipe","options":{"__comment__":"The time parameter describes to the recipe how to extract the datetime. Two options possible: metadata_tag OR filename","time_parameter":{"metadata_tag":{"__comment__":"The name of the tag in the gdal file, the default is the one below","tag_name":"TIFFTAG_DATETIME"},"filename":{"__comment__":"The regex has to contain groups of tokens, separated by parentheses. The group parameter specifies which regex group to use for retrieving the time value","regex":"(.*)_(.*)_(.+?)_(.*)","group":"2"},"__comment__":"The format of the value of the time parameter: 'auto' will try to guess it","datetime_format":"YYYY:MM:DD HH:mm:ss"},"__comment__":"The time crs to be used with the 2d crs to create a compound crs for the whole coverage","time_crs":"http://kahlua.eecs.jacobs-university.de:8080/def/crs/OGC/0/AnsiDate","__comment__":"[OPTIONAL]The tiling of the coverage in rasql format","tiling":"ALIGNED [0:10, 0:1000, 0:500]","__comment__":"[OPTIONAL] Specify the names of the bands, in cases the automatic inference (default: field_1, ...) is not good enough","band_names":["red","green","blue"]}}}

Creating your own recipe

The recipes above cover a frequent but limited subset of what is possible to model using a coverage. WCSTImport allows you to define your own recipes in order to fill these gaps.
In this tutorial we will create a recipe that can construct a 3D coverage from 2D georeferenced files. The 2D files that we want to target have all the same CRS and cover the same geographic area.
The time information that we want to retrieve is stored in each file in a GDAL readable tag. The tag name and time format differ from dataset to dataset so we want to take this information as an option to the recipe. We would also want to be flexible with the time crs that we require so we will add this option as well.

Based on this usecase, the following ingredient file seems to fulfill our need:

The last command is needed to tell python that this folder is containing python sources, if you forget to add it, your recipe will not be automatically detected.
Let's first create an example of our ingredients file so we get a feeling for what we will be dealing with in the recipe. Our recipe will just request from the user two parameters
Let's now create our recipe, by creating a file called recipe.py

touch recipe.py
editor recipe.py

Use your favorite editor or IDE to work on the recipe (there are type annotations for most WCSTImport classes so an IDE like PyCharm? would give out of the box completion support). First, let's add the skeleton of the recipe (please note that in this tutorial, we will omit the import section of the files (your IDE will help you auto import them)):

classRecipe(BaseRecipe):def__init__(self, session):"""
The recipe class for my_custom_recipe. To get an overview of the ingredients needed for this
recipe check ingredients/my_custom_recipe
:param Session session: the session for the import tun
"""super(Recipe,self).__init__(session)self.options = session.get_recipe()['options']defvalidate(self):super(Recipe,self).validate()passdefdescribe(self):"""
Implementation of the base recipe describe method
"""passdefingest(self):"""
Ingests the input files
"""passdefstatus(self):"""
Implementation of the status method
:rtype (int, int)
"""pass@staticmethoddefget_name():return"my_custom_recipe"

The first thing you need to do is to make sure get_name() method returns the name of your recipe. This name will be used to determine if an ingredient file should be processed by your recipe.
Next, you will need to focus on the constructor. Let's examine it. We get a single parameter called session which contains all the information collected from the user plus a couple more useful things.
You can check all the available methods of the class in the session.py file, for now we will just save the options provided by the user that are available in session.get_recipe()options? in a class attribute.

Next, let's look at the validate method. In this method, you will validate the options for the recipe provided by the user. It's generally a good idea to call the super method to validate some of the general things like the WCST Service availability and so on although it is not mandatory. We also want to validate our custom recipe options here. This is how the recipe looks like now:

classRecipe(BaseRecipe):def__init__(self, session):"""
The recipe class for my_custom_recipe. To get an overview of the ingredients needed for this
recipe check ingredients/my_custom_recipe
:param Session session: the session for the import tun
"""super(Recipe,self).__init__(session)self.options = session.get_recipe()['options']defvalidate(self):super(Recipe,self).validate()if"time_crs"notinself.options orself.options['time_crs']=="":raise RecipeValidationException("No valid time crs provided")if'time_tag'notinself.options:raise RecipeValidationException("No valid time tag parameter provided")if'time_format'notinself.options:raise RecipeValidationException("You have to provide a valid time format")defdescribe(self):"""
Implementation of the base recipe describe method
"""passdefingest(self):"""
Ingests the input files
"""passdefstatus(self):"""
Implementation of the status method
:rtype (int, int)
"""pass@staticmethoddefget_name():return"my_custom_recipe"

Now that our recipe can validate the recipe options, let's move to the describe method. This method allows you to let your users know any relevant information about the ingestion before it actually starts. The irregular_timeseries recipe prints the timestamp for the first couple of slices for the user to check if they are correct. Similar behaviour should be done based on what your recipe has to do.

Next, we should define the ingest behaviour. The framework does not make any assumptions about how the correct method of ingesting is, however it offers a lot of utility functionality that help you do it in a more standardized way. We will continue this tutorial by describing how to take advantage of this functionality, however, note that this is not required for the recipe to work.
The first thing that you need to do is to define an importer object. This importer object, takes a coverage object and ingests it using WCST requests. The object has two public methods, ingest, which ingests the coverage into the WCST service (note: ingest can be an insert operation when the coverage was not defined, or update if the coverage exists. The importer will handle both cases for you, so you don't have to worry if the coverage already exists.) and get_progress which returns a tuple containing the number of imported slices and the total number of slices. After adding the importer, the code should look like this:

In order to build the importer, we need to create a coverage object. Let's see how we can do that. The coverage constructor requires a:

coverage_id: the id of the coverage

slices: a list of slices that compose the coverage. Each slice defines the position in the coverage and the data that should be defined at the specified position

range_fields: the range fields for the coverage

crs: the crs of the coverage

pixel_data_type: the type of the pixel in gdal format, e.g. Byte, Float32 etc

You can construct the coverage object in many ways, we will present further a specific method of doing it. Let's start from the crs of the coverage. For our recipe, we want a 3D crs, composed of the CRS of the 2D images and a time crs indicated. The two lines of code would give us exactly this:

# Get the crs of one of the images using a GDAL helper class. We are assuming all images have the same CRS
gdal_dataset = GDALGmlUtil(self.session.get_files()[0].get_filepath())# Get the crs of the coverage by compounding the two crses
crs = CRSUtil.get_compound_crs([gdal_dataset.get_crs(),self.options['time_crs']])

Let's also get the range fields for this coverage. We can extract them again form the 2D image using a helper class that can use GDAL to get the relevant information:

fields = GdalRangeFieldsGenerator(gdal_dataset).get_range_fields()

Let's also get the pixel base type, again using the gdal helper:

pixel_type = gdal_dataset.get_band_gdal_type()

Let's see what we have so far:

classRecipe(BaseRecipe):def__init__(self, session):"""
The recipe class for my_custom_recipe. To get an overview of the ingredients needed for this
recipe check ingredients/my_custom_recipe
:param Session session: the session for the import tun
"""super(Recipe,self).__init__(session)self.options = session.get_recipe()['options']self.importer =Nonedefvalidate(self):super(Recipe,self).validate()if"time_crs"notinself.options orself.options['time_crs']=="":raise RecipeValidationException("No valid time crs provided")if'time_tag'notinself.options:raise RecipeValidationException("No valid time tag parameter provided")if'time_format'notinself.options:raise RecipeValidationException("You have to provide a valid time format")defdescribe(self):"""
Implementation of the base recipe describe method
"""passdefingest(self):"""
Ingests the input files
"""self._get_importer().ingest()defstatus(self):"""
Implementation of the status method
:rtype (int, int)
"""passdef_get_importer(self):ifself.importer isNone:self.importer = Importer(self._get_coverage())returnself.importer
def_get_coverage(self):# Get the crs of one of the images using a GDAL helper class. We are assuming all images have the same CRS
gdal_dataset = GDALGmlUtil(self.session.get_files()[0].get_filepath())# Get the crs of the coverage by compounding the two crses
crs = CRSUtil.get_compound_crs([gdal_dataset.get_crs(),self.options['time_crs']])
fields = GdalRangeFieldsGenerator(gdal_dataset).get_range_fields()
pixel_type = gdal_dataset.get_band_gdal_type()
coverage_id =self.session.get_coverage_id()
slices =self._get_slices(crs)return Coverage(coverage_id, slices, fields, crs, pixel_type)def_get_slices(self, crs):pass@staticmethoddefget_name():return"my_custom_recipe"

As you can notice, the only thing left to do is to implement the _get_slices() method. To do so we need to iterate over all the input files and create a slice for each. Here's an example on how we could do that

def_get_slices(self, crs):# Let's first extract all the axes from our crs
crs_axes = CRSUtil(crs).get_axes()# Prepare a list container for our slices
slices =[]# Iterate over the files and create a slice for each onefor infile inself.session.get_files():# We need to create the exact position in time and space in which to place this slice# For the space coordinates we can use the GDAL helper to extract it for us# The helper will return a list of subsets based on the crs axes that we extracted# and will fill the coordinates for the ones that it can (the easting and northing axes)
subsets = GdalAxisFiller(crs_axes, GDALGmlUtil(infile.get_filepath())).fill()# Now we must fill the time axis as well and indicate the position in timefor subset in subsets:# Find the time axisif subset.coverage_axis.axis.crs_axis.is_future():# Set the time position for it. Our recipe extracts it from a GDAL tag provided by the user
subset.interval.low = GDALGmlUtil(infile).get_datetime(self.options["time_tag"])
slices.append(Slice(subsets, FileDataProvider(tpair.file)))return slices

And we are done we now have a valid coverage object. The last thing needed is to define the status method. This method need to provide a status update to the framework in order to display it to the user. We need to return the number of finished work items and the number of total work items. In our case we can measure this in terms of slices and the importer can already provide this for us.
So all we need to do is the following:

defstatus(self):returnself._get_importer().get_progress()

classRecipe(BaseRecipe):def__init__(self, session):"""
The recipe class for my_custom_recipe. To get an overview of the ingredients needed for this
recipe check ingredients/my_custom_recipe
:param Session session: the session for the import tun
"""super(Recipe,self).__init__(session)self.options = session.get_recipe()['options']self.importer =Nonedefvalidate(self):super(Recipe,self).validate()if"time_crs"notinself.options orself.options['time_crs']=="":raise RecipeValidationException("No valid time crs provided")if'time_tag'notinself.options:raise RecipeValidationException("No valid time tag parameter provided")if'time_format'notinself.options:raise RecipeValidationException("You have to provide a valid time format")defdescribe(self):"""
Implementation of the base recipe describe method
"""return"This is some description."defingest(self):"""
Ingests the input files
"""self._get_importer().ingest()defstatus(self):returnself._get_importer().get_progress()def_get_importer(self):ifself.importer isNone:self.importer = Importer(self._get_coverage())returnself.importer
def_get_coverage(self):# Get the crs of one of the images using a GDAL helper class. We are assuming all images have the same CRS
gdal_dataset = GDALGmlUtil(self.session.get_files()[0].get_filepath())# Get the crs of the coverage by compounding the two crses
crs = CRSUtil.get_compound_crs([gdal_dataset.get_crs(),self.options['time_crs']])
fields = GdalRangeFieldsGenerator(gdal_dataset).get_range_fields()
pixel_type = gdal_dataset.get_band_gdal_type()
coverage_id =self.session.get_coverage_id()
slices =self._get_slices(crs)return Coverage(coverage_id, slices, fields, crs, pixel_type)def_get_slices(self, crs):# Let's first extract all the axes from our crs
crs_axes = CRSUtil(crs).get_axes()# Prepare a list container for our slices
slices =[]# Iterate over the files and create a slice for each onefor infile inself.session.get_files():# We need to create the exact position in time and space in which to place this slice# For the space coordinates we can use the GDAL helper to extract it for us# The helper will return a list of subsets based on the crs axes that we extracted# and will fill the coordinates for the ones that it can (the easting and northing axes)
subsets = GdalAxisFiller(crs_axes, GDALGmlUtil(infile.get_filepath())).fill()# Now we must fill the time axis as well and indicate the position in timefor subset in subsets:# Find the time axisif subset.coverage_axis.axis.crs_axis.is_future():# Set the time position for it. Our recipe extracts it from a GDAL tag provided by the user
subset.interval.low = GDALGmlUtil(infile).get_datetime(self.options["time_tag"])
slices.append(Slice(subsets, FileDataProvider(tpair.file)))return slices
@staticmethoddefget_name():return"my_custom_recipe"

We now have a functional recipe. You can try the ingredients file against it and see how it works.