Michael Goodchild, in his 2007 paper that started the research into Volunteered Geographic Information (VGI), mentioned OpenStreetMap (OSM), and since then there is a lot of conflation of OSM and VGI. In some recent papers you can find statements such as ‘OpenstreetMap is considered as one of the most successful and popular VGI projects‘ or ‘the most prominent VGI project OpenStreetMap‘ so, at some level, the boundary between the two is being blurred. I’m part of the problem – for example, with the title of my 2010 paper ‘How good is volunteered geographical information? A comparative study of OpenStreetMap and Ordnance Survey datasets‘. However, the more I think about it, the more uncomfortable I am with this equivalence. I feel that the recent line from Neis & Zielstra (2014) is more accurate: ‘One of the most utilized, analyzed and cited VGI-platforms, with an increasing popularity over the past few years, is OpenStreetMap (OSM)‘. I’ll explain why.

Let’s look at the whole area of OpenStreetMap studies. Over the past decade, several types of research paper have emerged.

First, there is a whole set of research projects that use OSM data because it’s easy to use and free to access (in computer vision or even string theory). These studies are not part of ‘OSM studies’ or VGI, as, for them, this is just data to be used.

Edward Betts. CC-By-SA 2.0 via Wikimedia Commons

Second, there are studies about OSM data: quality, evolution of objects and other aspects from researchers such as Peter Mooney, Pascal Neis, Alex Zipf and many others.

Third, there are studies that also look at the interactions between the contribution and the data – for example, in trying to infer trustworthiness.

Fourth, there are studies that look at the wider societal aspects of OpenStreetMap, with people like Martin Dodge, Chris Perkins, and Jo Gerlach contributing in interesting discussions.

[Unfortunately, due to academic practices and publication outlets, many of these papers are locked behind paywalls, but thatis another issue… ]

In short, there is a significant body of knowledge regarding the nature of the project, the implications of what it produces, and ways to understand the information that emerges from it. Clearly, we now know that OSM produces good data and are ware of the patterns of contribution. What is also clear is that many of these patterns are specific to OSM. Because of the importance of OSM to so many application areas (including illustrative maps in string theory!) these insights are very important. Some of these insights are expected to also be present in other VGI projects (hence my suggestions for assertions about VGI) but this needs to be done carefully, only when there is evidence from other projects that this is the case. In short, we should avoid conflating VGI and OSM.

Having followed the project during this decade, there is much to reflect on – such as thinking about open research questions, things that the academic literature failed to notice about OSM or the things that we do know about OSM and VGI because of the openness of the project. However, as I was preparing the talk for the INSPIRE conference, I was starting to think about the start dates of OSM (2004), TomTom Map Share (2007), Waze (2008), Google Map Maker (2008). While there are conceptual and operational differences between these projects, in terms of ‘knowledge-based peer production systems’ they are fairly similar: all rely on large number of contributors, all use both large group of contributors who contribute little, and a much smaller group of committed contributors who do the more complex work, and all are about mapping. Yet, OSM started 3 years before these other crowdsourced mapping projects, and all of them have more contributors than OSM.

Since OSM is described as ‘Wikipedia of maps‘, the analogy that I was starting to think of was that it’s a bit like a parallel history, in which in 2001, as Wikipedia starts, Encarta and Britannica look at the upstart and set up their own crowdsourcing operations so within 3 years they are up and running. By 2011, Wikipedia continues as a copyright free encyclopedia with sizable community, but Encarta and Britannica have more contributors and more visibility.

Knowing OSM closely, I felt that this is not a fair analogy. While there are some organisational and contribution practices that can be used to claim that ‘it’s the fault of the licence’ or ‘it’s because of the project’s culture’ and therefore justify this, not flattering, analogy to OSM, I sensed that there is something else that should be used to explain what is going on.

Then, during my holiday in Italy, I was enjoying the offline TripAdvisor app for Florence, using OSM for navigation (in contrast to Google Maps which are used in the online app) and an answer emerged. Within OSM community, from the start, there was some tension between the ‘map’ and ‘database’ view of the project. Is it about collecting the data so beautiful maps or is it about building a database that can be used for many applications?

Saying that OSM is about the map mean that the analogy is correct, as it is very similar to Wikipedia – you want to share knowledge, you put it online with a system that allow you to display it quickly with tools that support easy editing the information sharing. If, on the other hand, OSM is about a database, then OSM is about something that is used at the back-end of other applications, a lot like DBMS or Operating System. Although there are tools that help you to do things easily and quickly and check the information that you’ve entered (e.g. displaying the information as a map), the main goal is the building of the back-end.

Maybe a better analogy is to think of OSM as ‘Linux of maps’, which mean that it is an infrastructure project which is expected to have a lot of visibility among the professionals who need it (system managers in the case of Linux, GIS/Geoweb developers for OSM), with a strong community that support and contribute to it. The same way that some tech-savvy people know about Linux, but most people don’t, I suspect that TripAdvisor offline users don’t notice that they use OSM, they are just happy to have a map.

The problem with the Linux analogy is that OSM is more than software – it is indeed a database of information about geography from all over the world (and therefore the Wikipedia analogy has its place). Therefore, it is somewhere in between. In a way, it provide a demonstration for the common claim in GIS circles that ‘spatial is special‘. Geographical information is infrastructure in the same way that operating systems or DBMS are, but in this case it’s not enough to create an empty shell that can be filled-in for the specific instance, but there is a need for a significant amount of base information before you are able to start building your own application with additional information. This is also the philosophical difference that make the licensing issues more complex!

In short, both Linux or Wikipedia analogies are inadequate to capture what OSM is. It has been illuminating and fascinating to follow the project over its first decade, and may it continue successfully for more decades to come.

Opening geodata is an interesting issue for INSPIRE directive. INSPIRE was set before the hype of Government 2.0 was growing and pressure on opening data became apparent, so it was not designed with these aspects in mind explicitly. Therefore the way in which the organisations that are implementing INSPIRE are dealing with the provision of open and linked data is bound to bring up interesting challenges.

Dealing with open and linked data was the topic that I followed on the second day of INSPIRE 2014 conference. The notes below are my interpretation of some of the talks.

Tina Svan Colding discussed the Danish attempt to estimate the value (mostly economically) of open geographic data. The study was done in collaboration with Deloitte, and they started with a change theory – expectations that they will see increase demands from existing customers and new ones. The next assumption is that there will be new products, companies and lower prices and then that will lead to efficiency and better decision making across the public and private sector, but also increase transparency to citizens. In short, trying to capture the monetary value with a bit on the side. They have used statistics, interviews with key people in the public and private sector and follow that with a wider survey – all with existing users of data. The number of users of their data increased from 800 users to over 10,000 within a year. The Danish system require users to register to get the data, so this are balk numbers, but they could also contacted them to ask further questions. The new users – many are citizens (66%) and NGO (3%). There are further 6% in the public sector which had access in principle in the past but the accessibility to the data made it more usable to new people in this sector. In the private sector, construction, utilities and many other companies are using the data. The environmental bodies are aiming to use data in new ways to make environmental consultation more engaging to audience (is this is another Deficit Model? assumption that people don’t engage because it’s difficult to access data?). Issues that people experienced are accessibility to users who don’t know that they need to use GIS and other datasets. They also identified requests for further data release. In the public sector, 80% identified potential for saving with the data (though that is the type of expectation that they live within!).

Roope Tervo, from the Finish Meteorological Institute talked about the implementation of open data portal. Their methodology was very much with users in mind and is a nice example of user-centred data application. They hold a lot of data – from meteorological observations to air quality data (of course, it all depends on the role of the institute). They have chose to use WFS download data, with GML as the data format and coverage data in meteorological formats (e.g. grib). He showed that selection of data models (which can be all compatible with the legislation) can have very different outcomes in file size and complexity of parsing the information. Nice to see that they considered user needs – though not formally. They created an open source JavaScript library that make it is to use the data- so go beyond just releasing the data to how it is used. They have API keys that are based on registration. They had to limit the number of requests per day and the same for the view service. After a year, they have 5000 users, and 100,000 data downloads per day and they are increasing. Increasing slowly. They are considering how to help clients with complex data models.

Panagiotis Tziachris was exploring the clash between ‘heavy duty’ and complex INSPIRE standards and the usual light weight approaches that are common in Open Data portal (I think that he intended in the commercial sector that allow some reuse of data). This is a project of 13 Mediterranean regions in Spain, Italy, Slovenia, Montenegro, Greece, Cyprus and Malta. The HOMER project (website http://homerproject.eu/) used different mechanism, including using hackathons to share knowledge and experience between more experienced players and those that are new to the area. They found them to be a good way to share practical knowledge between partners. This is an interesting side of purposeful hackathon within a known people in a project and I think that it can be useful for other cases. Interestingly, from the legal side, they had to go beyond the usual documents that are provided in an EU consortium, and in order to allow partners to share information they created a memorandum of understanding for the partners as this is needed to deal with IP and similar issues. Also practices of open data – such as CKAN API which is a common one for open data websites were used. They noticed separation between central administration and local or regional administration – the competency of the more local organisations (municipality or region) is sometimes limited because knowledge is elsewhere (in central government) or they are in different stages of implementation and disagreements on releasing the data can arise. Antoehr issue is that open data is sometime provided at regional portals while another organisation at the national level (environment ministry or cadastre body) is the responsible to INSPIRE. The lack of capabilities at different governmental levels is adding to the challenges of setting open data systems. Sometime Open Data legislation are only about the final stage of the process and not abour how to get there, while INPIRE is all about the preparation, and not about the release of data – this also creates mismatching.

Adam Iwaniak discussed how “over-engineering” make the INSPIRE directive inoperable or relevant to users, on the basis of his experience in Poland. He asks “what are the user needs?” and demonstrated it by pointing that after half term of teaching students about the importance of metadata, when it came to actively searching for metadata in an assignment, the students didn’t used any of the specialist portals but just Google. Based on this and similar experiences, he suggested the creation of a thesaurus that describe keywords and features in the products so it allows searching according to user needs. Of course, the implementation is more complex and therefore he suggests an approach that is working within the semantic web and use RDF definitions. By making the data searchable and index-able in search engines so they can be found. The core message was to adapt the delivery of information to the way the user is most likely to search it – so metadata is relevant when the producer make sure that a search in Google find it.

Jesus Estrada Vilegas from the SmartOpenData project http://www.smartopendata.eu/ discussed the implementation of some ideas that can work within INSPIRE context while providing open data. In particular, he discussed a Spanish and Portuguese data sharing. Within the project, they are providing access to the data by harmonizing the data and then making it linked data. Not all the data is open, and the focus of their pilot is in agroforestry land management. They are testing delivery of the data in both INSPIRE compliant formats and the internal organisation format to see which is more efficient and useful. INSPIRE is a good point to start developing linked data, but there is also a need to compare it to other ways of linked the data

Massimo Zotti talked about linked open data from earth observations in the context of business activities, since he’s working in a company that provide software for data portals. He explored the business model of open data, INSPIRE and the Copernicus programme. From the data that come from earth observation, we can turn it into information – for example, identifying the part of the soil that get sealed and doesn’t allow the water to be absorbed, or information about forest fires or floods etc. These are the bits of useful information that are needed for decision making. Once there is the information, it is possible to identify increase in land use or other aspects that can inform policy. However, we need to notice that when dealing with open data mean that a lot of work is put into bringing datasets together. The standarisation of data transfer and development of approaches that helps in machine-to-machine analysis are important for this aim. By fusing data they are becoming more useful and relevant to knowledge production process. A dashboard approach to display the information and the processing can help end users to access the linked data ‘cloud’. Standarisation of data is very important to facilitate such automatic analysis, and also having standard ontologies is necessary. From my view, this is not a business model, but a typical one to the operations in the earth observations area where there is a lot of energy spend on justification that it can be useful and important to decision making – but lacking quantification of the effort that is required to go through the process and also the speed in which these can be achieved (will the answer come in time for the decision?). A member of the audience also raised the point that assumption of machine to machine automatic models that will produce valuable information all by themselves is questionable.

Maria Jose Vale talked about the Portuguese experience in delivering open data. The organisation that she works in deal with cadastre and land use information. She was also discussing on activities of the SmartOpenData project. She describe the principles of open data that they considered which are: data must be complete, primary, timely, accessible, processable; data formats must be well known, should be permanence and addressing properly usage costs. For good governance need to know the quality of the data and the reliability of delivery over time. So to have automatic ways for the data that will propagate to users is within these principles. The benefits of open data that she identified are mostly technical but also the economic values (and are mentioned many times – but you need evidence similar to the Danish case to prove it!). The issues or challenges of open data is how to deal with fuzzy data when releasing (my view: tell people that it need cleaning), safety is also important as there are both national and personal issues, financial sustainability for the producers of the data, rates of updates and addressing user and government needs properly. In a case study that she described, they looked at land use and land cover changes to assess changes in river use in a river watershed. They needed about 15 datasets for the analysis, and have used different information from CORINE land cover from different years. For example, they have seen change from forest that change to woodland because of fire. It does influence water quality too. Data interoperability and linking data allow the integrated modelling of the evolution of the watershed.

Francisco Lopez-Pelicer covered the Spanish experience and the PlanetData project http://www.planet-data.eu/ which look at large scale public data management. Specifically looking in a pilot on VGI and Linked data with a background on SDI and INSPIRE. There is big potential, but many GI producers don’t do it yet. The issue is legacy GIS approaches such as WMS and WFS which are standards that are endorsed in INSPIRE, but not necessarily fit into linked data framework. In the work that he was involved in, they try to address complex GI problem with linked data . To do that, they try to convert WMS to a linked data server and do that by adding URI and POST/PUT/DELETE resources. The semantic client see this as a linked data server even through it can be compliant with other standards. To try it they use the open national map as authoritative source and OpenStreetMap as VGI source and release them as linked data. They are exploring how to convert large authoritative GI dataset into linked data and also link it to other sources. They are also using it as an experiment in crowdsourcing platform development – creating a tool that help to assess the quality of each data set. The aim is to do quality experiments and measure data quality trade-offs associated with use of authoritative or crowdsourced information. Their service can behave as both WMS and “Linked Map Server”. The LinkedMap, which is the name of this service, provide the ability to edit the data and explore OpenStreetMap and thegovernment data – they aim to run the experiment in the summer so this can be found at http://linkedmap.unizar.es/. The reason to choose WMS as a delivery standard is due to previous crawl over the web which showed that WMS is the most widely available service, so it assumed to be relevant to users or one that most users can capture.

Paul van Genuchten talked about the GeoCat experience in a range of projects which include support to Environment Canada and other activities. INSPIRE meeting open data can be a clash of cultures and he was highlighting neogeography as the term that he use to describe the open data culture (going back to the neogeo and paleogeo debate which I thought is over and done – but clearly it is relevant in this context). INSPIRE recommend to publish data open and this is important to ensure that it get big potential audience, as well as ‘innovation energy’ that exist among the ‘neogeo’/’open data’ people. The common things within this culture are expectations that APIs are easy to use, clean interfaces etc. But under the hood there are similarities in the way things work. There is a perceived complexity by the community of open data users towards INSPIRE datasets. Many of Open Data people are focused and interested in OpenStreetMap, and also look at companies such as MapBox as a role model, but also formats such as GeoJSON and TopoJSON. Data is versions and managed in git like process. The projection that is very common is web mercator. There are now not only raster tiles, but also vector tiles. So these characteristics of the audience can be used by data providers to provide help in using their data, but also there are intermediaries that deliver the data and convert it to more ‘digestible’ forms. He noted CitySDK by Waag.org which they grab from INSPIRE and then deliver it to users in ways that suite open data practices.He demonstrated the case of Environment Canada where they created a set of files that are suitable for human and machine use.

Ed Parsons finished the set of talks of the day (talk link goo.gl/9uOy5N) , with a talk about multi-channel approach to maximise the benefits of INSPIRE. He highlighted that it’s not about linked data, although linked data it is part of the solution to make data accessibility. Accessibility always wins online – and people make compromises (e.g. sound quality in CD and Spotify). Google Earth can be seen as a new channel that make things accessible, and while the back-end is not new in technology the ease of access made a big difference. The example of Denmark use of minecraft to release GI is an example of another channel. Notice the change over the past 10 years in video delivery, for example, so the early days of the video delivery was complex and require many steps and expensive software and infrastructure, and this is somewhat comparable to current practice within geographic information. Making things accessible through channels like YouTube and the whole ability around it changed the way video is used, uploaded and consumed, and of course changes in devices (e.g. recording on the phone) made it even easier. Focusing on the aspects of maps themselves, people might want different things that are maps and not only the latest searchable map that Google provide – e.g. the administrative map of medieval Denmark, or maps of flood, or something that is specific and not part of general web mapping. In some cases people that are searching for something and you want to give them maps for some queries, and sometime images (as in searching Yosemite trails vs. Yosemite). There are plenty of maps that people find useful, and for that Google now promoting Google Maps Gallery – with tools to upload, manage and display maps. It is also important to consider that mapping information need to be accessible to people who are using mobile devices. The web infrastructure of Google (or ArcGIS Online) provide the scalability to deal with many users and the ability to deliver to different platforms such as mobile. The gallery allows people to brand their maps. Google want to identify authoritative data that comes from official bodies, and then to have additional information that is displayed differently. But separation of facts and authoritative information from commentary is difficult and that where semantics play an important role. He also noted that Google Maps Engine is just maps – just a visual representation without an aim to provide GIS analysis tools.

There is something in the physical presence of book that is pleasurable. Receiving the copy of Introducing Human Geographies was special, as I have contributed a chapter about Geographic Information Systems to the ‘cartographies’ section.
It might be a response to Ron Johnston critique of Human Geography textbooks or a decision by the editors to extend the content of the book, but the book now contains three chapters that deal with maps and GIS. The contributions are the ‘Power of maps’ by Jeremy Crampton, a chapter about ‘Geographical information systems’ by me, and ‘Counter geographies’ by Wen Lin. To some extent, we’ve coordinated the writing, as this is a textbook for undergraduates in geography and we wanted to have a coherent message.

Overall, you’ll notice a lot of references to participatory and collaborative mapping, with OpenStreetMap and PPGIS.net mentioned several times.

In my chapter I have covered both the quantitative/spatial science face of GIS, as well as the critical/participatory one. As the introduction to the section describes:

“Chapter 14 focuses on the place of Geographical Information Systems (GIS) within contemporary mapping. A GIS involves the representation of geographies in digital computers. … GIS is now a widespread and varied form of mapping, both within the academy and beyond. In the chapter, he speaks to that variety by considering the use of GIS both within practices such as location planning, where it is underpinned by the intellectual paradigm of spatial science and quantitative data, and within emergent fields of ‘critical’ and ‘qualitative GIS’, where GIS could be focused on representing the experiences of marginalized groups of people, for example. Generally, Muki argues against the equation of GIS with only one sort of Human Geography, showing how it can be used as a technology within various kinds of research. More specifically, his account shows how current work is pursuing those options through careful consideration of both the wider issues of power and representation present in mapping and the detailed, technical and scientific challenges within GIS development.”

To preview the chapter on Google Book, use this link . I hope that it will be useful introduction to GIS to Geography students.

During the symposium “The Future of PGIS: Learning from Practice?” which was held at ITC-University of Twente, 26 June 2013, I gave a talk titled ‘Keeping the spirit alive’ – preservations of participatory GIS values in the Geoweb, which explored what was are the important values in participatory GIS and how they translate to the Geoweb, Volunteered Geographic Information and current interests in crowdsourcing. You can watch the talk below.

The Eye on Earth Summit took place in Abu Dhabi on the 12 to 15 December 2011, and focused on ‘the crucial importance of environmental and societal information and networking to decision-making’. The summit was an opportunity to evaluate the development of Principle 10 from Rio declaration in 1992 as well as Chapter 40 of Agenda 21 both of which focus on environmental information and decision making. The summit’s many speakers gave inspirational talks – with an impressive list including Jane Goodall highlighting the importance of information for education; Mathis Wackernagel updating on the developments in Ecological Footprint; Rob Swan on the importance of Antarctica; Sylvia Earle on how we should protect the oceans; Mark Plotkin, Rebecca Moore and Chief Almir Surui on indigenous mapping in the Amazon and man others. The white papers that accompany the summit can be found in the Working Groups section of the website, and are very helpful updates on the development of environmental information issues over the past 20 years and emerging issues.

Discussing environmental information for a week made me to revisit the frameworkand review the changes that occurred over the past decade.

First, I’ll present the conceptual framework, which is based on 6 assertions. The framework was developed on the basis of a lengthy review in early 1999 of the available information on environmental information systems (the review was published as CASA working paper 7). While synthesising all the information that I have found, some underlying assumptions started to emerge, and by articulating them and putting them together and showing how they were linked, I could make more sense of the information that I found. This helped in answering questions such as ‘Why do environmental information systems receive so much attention from policy makers?’ and ‘Why are GIS appearing in so many environmental information systems ?’. I have used the word ‘assertions’ as the underlying principles seem to be universally accepted and taken for granted. This is especially true for the 3 core assumptions (assertions 1-3 below).

Within the framework of sustainable development, all stakeholders should take part in the decision making processes. A direct result of this is a call for improved public participation in environmental decision making.

Environmental information is exceptionally well suited to GIS (and vice versa). GIS development is closely related to developments in environmental research, and GIS output is considered to be highly advantageous in understanding and interpreting environmental data.

(Notice that this is emerging from combining 1 and 2) To achieve public participation in environmental decision making, the public must gain access to environmental information, data and knowledge.

(Based on 1 and 3) GIS use and output is essential for good environmental decision making.

(Based on all the others) Public Environmental Information Systems should be based on GIS technologies. Such systems are vital for public participation in environmental decision making.

Intriguingly, the Eye on Earth White Paper notes ‘This is a very “Geospatial” centric view; however it does summarise the broader principles of Environmental Information and its use’. Yet, my intention was not to develop a ‘Geospatial’ centric view – I was synthesising what I have found, and the keywords that I have used in the search did not include GIS. Therefore, the framework should be seen as an attempt to explain the reason that GIS is so prominent.

With this framework in mind, I have noticed a change over the past decade. Throughout the summit, GIS and ‘Geospatial’ systems were central – and they were mentioned and demonstrated many times. I was somewhat surprised how prominent they were in Sha Zukang speech (He is the Undersecretary General, United Nations, and Secretary General Rio +20 Summit). They are much more central than they were when I carried out the survey, and I left the summit feeling that for many speakers, presenters and delegates, it is now expected that GIS will be at the centre of any EIS. The wide acceptance does mean that initiatives such as the ‘Eye on Earth Network’ that is based on geographic information sharing is now possible. In the past, because of the very differing data structures and conceptual frameworks, it was more difficult to suggest such integration. The use of GIS as a lingua franca for people who are dealing with environmental information is surely helpful in creating an integrative picture of the situation at a specific place, across multiple domains of knowledge.

However, I see a cause for concern for the equivalence of GIS with EIS. As the literature in GIScience discussed over the years, GIS is good at providing snapshots, but less effective in modelling processes, or interpolating in both time and space, and most importantly, is having a specific way of creating and processing information. For example, while GIS can be coupled with system dynamic modelling (which was used extensively in environmental studies – most notably in ‘Limits to Growth’) it is also possible to run such models and simulations in packages that don’t use geographic information – For example, in the STELLA package for system dynamics or in bespoke models that were created with dedicated data models and algorithms. Importantly, the issue is not about the technical issues of coupling different software packages such as STELLA or agent-based modelling with GIS. Some EIS and environmental challenge might benefit from different people thinking in different ways about various problems and solutions, and not always forced to consider how a GIS play a part in them.