A technical blog concerning Python programming and GIS applications

Using Arcpy with multiprocessing – Part 1

This blog post is the first of a three part series that outlines how to use Python multiprocessing (with both the inbuilt Python Multiprocessing library and the Parallel Python library, as there are issues with both in different situations).

If you are doing Python development, you may be interested in my Windows Dev Stack, which describes my development environment from high level technologies down to specific apps, and how they all work together.

This post will describe the general method by which a script or model can be parallelised, which involves splitting a dataset into small parts that may be solved independently, and running these parts concurrently on multiple processors. The purpose of this series is not to give you a one line example that you can copy and paste to your code, but step through the process and make the underlying principles clear. Please note that Esri also has a blog post describing use of Python’s Multiprocessing library with Arcpy (in particular Spatial Analyst) which did not work for me in a complex situation involving Network Analyst, but is worth checking out if you are doing intensive tasks with Spatial Analysis.

There are a few requirements before you can make an Arcpy script/model suitable for multiprocessing:

The most calculation intensive (i.e. time consuming) part of the code must be something that is (or can be) iterated (i.e. done again and again) and able to be made into a Python function that is then parallelised; this process will be described in the following posts.

Once it is made into a function there must be no issues with data access – each invocation of the function should either write to a different output database (Arc locks the entrie *.gdb in use, not just the feature class being accessed) or pass data back in a Python structure and write it to an output database only at a later stage.

The main purpose of these posts is to describe the process of parallelising a given problem, which I will explain using a made-up hypothetical example, it has no practical use whatsoever. The objective of the example is to identify the number and type, and accumulate a weight value, of all the Polygons within a certain distance of some Point features. In the example the Polygon feature class has the following attributes: ‘polyType’ and ‘polyWeight’.

Method pseudo-code:

# get variables from Arc
# check all inputs are valid
# for Polygon types:
# make feature layer of Polygons
# make feature layer of Points
# for rows in Points:
# get PointID
# select the Point row corresponding to PointID
# select by location: Polygons within the search distance
# for Polygon rows (within the selection):
# store the sums of weighting and count to a Python dictionary by PointID and polygonType
# for rows in Points:
# for Polygon types:
# access data from dictionary by PointID and Type
# write value to row