Background

Subsequently several mappers noticed the location coordinates of many sites described by that data appear to have been randomly offset from their true location. There appears to be little in common with these offsets other than the fairly frequent occurrence of the true latitude or longitude being apparently truncated to two decimal places.

Proposal

Upon consultation with John, between us we located apparently far more reliable positional data for a near-superset of the sites described sourced by the Australian Bureau of Meteorology. At his prompting I contacted BoM personnel regarding copyright issues, and subsequently received this response
from one Ian Muirhead:

> > Thanks for the email asking permission to generate maps for public
> > access. It is acceptable to display this information on a map provided
> > that you abide by the conditions specified on the Bureau's copyright page:
> > http://www.bom.gov.au/other/copyright.shtml
> >
> > There are a few caveats associated with these station locations
> > which you should be aware of before using them. For example, station moves
> > over history are not displayed, there is no listing of the reference
> > datum associated with geographic location, and the 4 decimal point
> > precision does not necessary imply an equivalent certainty in the
> > metadata. Following your request, we are going to include this
> > information in the data file, but I will email you a copy once it has
> > been produced (hopefully within the next couple of weeks).

Possible Automation

Whilst awaiting the Bureaus response to my request for permission to use their data, I created an early form of the script reproduced below, with a view to providing some degree of automation. The key opportunity for initially merging nodes already entered into OpenStreetMap revolved around the existence within many of the records of the "wmo:id" tag. Of course there was a catch: elsewhere in the Bureau website lie buried
several warnings similar to:

Some stations also possess a World Meteorological Organization (WMO) station number. The WMO number is
different to the Bureau of Meteorology number. It also uniquely specifies a station at any given time but
can be reassigned to another station if the new station takes priority in the global reporting network.
Only selected stations will have a WMO number. Significant stations may maintain their WMO number for
many decades.

This means that matching records based on WMO station identifier alone may not lead to expected results.

Testing The Water

Upon request from the talk-au list I prepared a
sample bag of changes, including at least one example from each category I intended to handle.

Feedback and Recommendations

Having a few other eyes view the samples described below certainly helped:

Change/improve attribution style.

The former "source:note" attribution is now performed by a pair of tags; a descriptive "attribution" tag (varies slightly according to merge case), and a fixed "attribution:url" tag, which always directs to the Bureau of Meteorology copyright web page.

Make use of bounding boxes.

The XAPI extract of potentially matching pre-existing OSM monitoring_station nodes is now performed within a bounding box determined dynamically bases upon the geographical spread of the stations listed by the Bureau. This mainly entailed rearranging the former internal order of operations within the script and adding a scan/sort/extract extrema on each of the latitude and longitude columns of the Bureau's station list (which covers both continental Australia, Antarctica, and neighbouring island chains as far north as e.g.the Marshall Islands.)

Exclude obsolete stations.

One of the sample sites I chose for demonstration turned out to have closed in 1973. The Bureau data showed this, but the original script had omitted to check for this. It does now.

Add tagging to each station to:

tag instruments present at each site

Having already investigated this I consider non-copyright-invasive methods of performing this update automatically not reliable, with the sole exception of whether a given site possesses a barometer (Script modified to insert a "weather:barometer=yes" tag whenever the Bureau station list includes a "Barometer Height" entry, and there is no former "weather:barometer" tag present.)

warn mappers they need to add details of site instrumentation.

This suggestion rejected on basis OpenStreetMap culture encourages mappers to update details based upon observation in any case. Why preach to the converted?

Concern that "<site-name>" and "<site-name> AWS" entries may be duplicates.

Short answer is some are definitely not, and some may in fact be so. In view of the "fixme=not_reviewed" regimen established in the original NOAA load, I feel the inclusion worthwhile, even if subsequent observations result in one entry being deleted again later.

The Helper Script

This script was used to assist in the classifying and merging of the Bureau station list with data already present in OpenStreetMap. In so far as possible, it produces output acceptable for import into JOSM (which is used as the actual conduit for importation of the results). It also attempts a cruder pre-formatting of records considered "too unusual" to be automatically handled, with a view to assisting manual importation at a later date.

The script was written with this requirement foremost: it does not perform any update automatically whatsoever; but tries to indicate degrees of safety to do so with various categories of import.

It is structured to return several XML-format (well, good enough for JOSM to accept anyway!) output files, each containing categories of merged or formatted data of varying degrees of confidence in the results of the particular merge method attempted. Any given BoM station may appear in one place after splitting; but there is a catch-all (case 6, below) which may associate multiple OSM nodes with a potential matching BoM station.

There are five cases fully handled, and of course the sixth alluded to above which requires human inspection and assembly if possible. These are:

Safest, most reliable. BoM station carries WMO id and name both matching OSM node, which has been untouched since the original NOAA bulk load. Example: Ballina Airport AWS.

All bets are off. All stations remaining after the other cases have been eliminated. However, fairly simple manual inspection reveals a valid-looking example: Norah Head Lighthouse.

To anticipate complaint, Yes, I could have written the script in Perl or something else. But I didn't. It works. It is not wrong, it is just differently right. I could have done it in TCL or CLIST just to be a complete pain, so be nice.

Progress

Planned Implementation Date

In the absence of further objection I intended to proceed with this scheme on Monday, 21st June, 2010.

Last-Minute Glitches

As you may deduce the import did not go quite as smoothly as I might have hoped! Nothing really more awkward than:

JOSM learning curve - multiple new node insertions have to be performed in "upload each object individually" mode.

The BoM uses the quaint colonial throwback convention of allocating locations East of longitude 180°, but West of the International Date Line, a longitude value greater than 180 (e.g. Papeete: S17.5333° E210.4000°). XAPI import responds to such insertion attempts with an enigmatic "The node is outside this world" error.

I had to be lucky enough to have somebody correct a station name between the time of OSM extraction and return of updated data.