Total Open Station: a specialised format converter

03 Jan 2017

It’s 2017 and
nine years ago
I started writing a set of Python scripts that would become Total Open
Station, a humble GPL-licensed tool to download and process data from
total station devices. I started from scratch, using the Python
standard library and pySerial as best as I could, to create a small
but complete program. Under the hood, I’ve been “religiously”
following the UNIX philosophy of one tool that does one thing well
and that is embodied by the two command line programs that perform the
separate steps of:

downloading data via a serial connection

converting the raw data to formats that can be used in GIS or CAD
environments

And despite starting as an itch to scratch, I also wanted TOPS to be
used by others, to provide something that was absent from the free
software world at the time, and that is still unchallenged in that
respect. So a basic and ugly graphical interface was created,
too. That gives a more streamlined view of the work, and largely
increases the number of potential users. Furthermore, TOPS can run not
just on Debian, Ubuntu or Fedora, but also on macOS and Windows and it
is well known that users of the latter operating systems don’t like
too much working from a terminal.

Development has always been slow. After 2011 I had only occasional use
for the software myself, no access to a real total station, so my
interest shifted towards giving a good architecture to the program and
extending the number of formats that can be imported and exported. In
the process, this entailed rewriting the internal data structures to
allow for more flexibility, such as differentiating between point,
line and polygon geometries.

Today, I still find GUI programming out of my league and interests. If
I’m going to continue developing TOPS it’s for the satisfaction of
crafting a good piece of software, learning new techniques in Python
or maybe rewriting entirely in a different programming language. It’s
clear that the core feature of TOPS is not being a workstation for
survey professionals (since it cannot compete with the existing market
of proprietary solutions that come attached to most devices), but
rather becoming a polyglot converter, capable of handling dozens of
raw data formats and flexibly exporting to good standard
formats. Flexibly exporting means that TOPS should have features to
filter data, to reproject data based on fixed base points with known
coordinates, to create separate output files or layers and so
on. Basically, to adapt to many more needs than it does now. From a
software perspective, there are a few notable examples that I’ve been
looking at for a long time: Sphinx,
GPSBabel and
Pandoc.

Sphinx is a documentation generator written in Python, the same
language I used for TOPS. You write a light markup source, and Sphinx can
convert it to several formats like HTML, ePub, LaTeX (and PDF),
groff. You can write short manuals, like the one I wrote for TOPS, or
entire books. Sphinx accepts many options, mostly from a configuration
file, and I took a few lines of code that I liked for handling the
internal dictionary (key-value hash) of all input and output formats
with conditional import of the selected module (rather than importing
all modules that won’t be used). Sphinx is clearly excellent at what
it does, even though the similarities with TOPS are not many. After
all, TOPS has to deal with many undocumented raw formats while Sphinx
has the advantage of only one standard format. Sphinx was originally
written by Georg Brandl, one of the best Python developers and a
contributor to the standard library, in a highly elegant
object-oriented architecture that I’m not able to replicate.

GPSBabel is a venerable and excellent program for GPS data conversion
and transfer. It handles dozens of formats in read/write mode and
each format has “suboptions” that are specific to it. GPSBabel has
also advanced filtering capabilities, it can merge multiple input
files and since a few years there is a minimal graphical
interface. Furthermore, GPSBabel is integrated in GIS programs like
QGIS and can work in a variety of ways thanks to its programmable
command line interface. A strong difference with TOPS is that many of
the GPS data formats are binary, and that the basic data structures of
waypoints, tracks and routes is essentially the same (contrast that
with the monster LandXML specification, or the dozens of possible
combinations in a Leica GSI file). GPSBabel is written in portable
C++, that I can barely read, so anything other than inspiration for
the user interface is out of question.

Pandoc is a universal document converter that reads many markup
document formats and can convert to a greater number of formats
including PDF (via LaTeX), docx, OpenDocument. The baseline format for
Pandoc is an enriched Markdown. There are two very interesting
features of Pandoc as a source of inspiration for a converter: the
internal data representation and the Haskell programming language. The
internal representation of the document in Pandoc is an abstract
syntax tree that is not necessarily as expressive as the source
format (think of all the typography and formatting in a printed
document) but it can be serialised to/from JSON and allows filters to
work regardless of the input or output format. Haskell is a functional
language that I have never programmed, although it lends to creating
complex and efficient programs that are easily extended. Pandoc works
from the command line and has a myriad of options – it’s also rather
common to
invoke it from Makefiles
or short scripts since one tends to work iteratively on a document. I
could see a future version of TOPS being rewritten in Haskell.

Scriptability and mode of use seem both important concepts to keep in
mind for a data converter. For total stations, a common workflow is to
download raw data, archive the original files and then convert to
another format (or even insert directly into a spatial database). With
the two programs totalopenstation-cli-connector and
totalopenstation-cli-parser such tasks are easily automated in a
single master script (or batch procedure) using a timestamp as
identifier for the job and the archived files. This means that once
the right parameters for your needs are found, downloading, archiving
and loading survey data in your working environment is a matter of
seconds, with no point-and-click, no icons, no mistakes. Looking at
GPSBabel, I wonder whether keeping the two programs separate really
makes sense from a UX perspective, as it would be more intuitive to
have a single totalopenstation executable. In fact, this dual
approach is a direct consequence of the small footprint of
totalopenstation-cli-connector, that merely acts as a convenience
layer on top of pySerial.

It’s also important to think about maintainability of code: I have
little interest in developing the perfect UI for TOPS, all the time
spent for development is removed from my spare time (since no one is
paying for TOPS) and it would be way more useful if dedicated plugins
existed for popular platforms (think QGIS, gvSIG, even ArcGIS supports
Python, not to mention CAD software). At this time TOPS supports ten
(yes, 10) input formats out of … hundreds, I think (some of which
are
proprietary, binary formats). Expanding
the list of supported formats is the single aim that I see as
reasonable and worth of being pursued.