Last weekend I had a short discussion with a well-respected OSM community member on some aspects of the ODbL and it ended more or less on a question, "then when does share alike kick in?" Given that it was 2am my answer wasn't particularly good and so I thought I should expand it a bit in writing. Particularly because I may have given the impression that it is a fairly complex matter, when in reality it is fairly simple.

Disclaimer: this is the personal opinion of a non-lawyer and it is neither an official policy statement by the LWG nor the OSMF. There are a handful of grey areas that I will not touch on, on some of them the LWG is preparing clarifications for discussion that will be available soon, in other words I am staying on safe ground.

Further it is well known that I'm not particularly in love with the ODbL, but on the other hand I do think it is a lot better than it is made out to be.

The ODbL has 3 concepts that are relevant to triggering share alike (verbatim quotes from the ODbL text):

"Derivative Database" - – Means a database based upon the Database, and includes any translation, adaptation, arrangement, modification, or any other alteration of the Database or of a Substantial part of the Contents. This includes, but is not limited to, Extracting or Re-utilising the whole or a Substantial part of the Contents in a new Database.

"Collective Database" - Means this Database in unmodified form as part of a collection of independent databases in themselves that together are assembled into a collective whole. A work that constitutes a Collective Database will not be considered a Derivative Database.

"“Publicly” – means to Persons other than You or under Your control by either more than 50% ownership or by the power to direct their activities (such as contracting with an independent consultant).

Starting with the last concept, share alike only kicks in when you "Publicly Use" a derivative database see (ODbL 1.0: 4.4(a) and 4.5(c)) , in house use, use by a contractor on your behalf and similar all do not trigger share alike and are not of interest. For the rest of this discussion please assume that whatever we are discussing, we are discussing it in the context of publicly using whatever you have created.

You are now probably already jumping up and down and shouting "And what about Produced Works?". Produced Works are only relevant to share alike in that if you "Publicly Use" a Produced Work (ODbL 1.0: 4.4(c)) any derivative database that was used in producing the Produce Work is considered "Publicly Used". Given that we already are assuming that, we do not need to consider Produced Works at all for the purpose of this discussion. Seems as if we have already considerably simplified the matter at hand.

If you read the ODbL *Derivative Databases" is what in the end share alike is attached to, original OSM data, extracts and modifications to such are all datasets that are, no surprise, subject to mandatory ODbL licensing. But what happens if you are using other data together with OSM derived datasets? Going back to the definitions, we see that such use creates a Collective Database.

How does share alike apply to a Collective Database? Well according to 4.5(a) "For the avoidance of doubt, You are not required to license Collective Databases under this License if You incorporate this Database or a Derivative Database in the collection, but this License still applies to this Database or a Derivative Database as a part of the Collective Database;".

In other words if you simply lump together one or more datasets with data derived from OSM, you are only required to licence the OSM part of the Collective Database under the ODbL or a compatible licence.

Example: assume that you have a proprietary global database of waste bins and want to use that data together with OSM data. No problem, you can use your data together with OSM without any issue and there is no need to publish your proprietary dataset on ODbL terms.

Grey area alert: while the example is clear, there are some kinds of "lumping together" that need clarification.

Now given that OSM has a lot of waste bins already, the result might contain a lot of duplicates that you would like to remove. Again no problem, you can simply remove all waste bins from the OSM dataset. Now the resulting OSM data is clearly a Derivative Database and is subject to the share alike terms in the ODbL (as it was before), but it does not change the status of the collective whole which can still have different licences for its individual parts and the whole.

Grey area alert: this kind of Derivative Database (reduced and extracted unmodified OSM data) triggers a number of obligations that essentially nobody is adhering to.

This is the point I was in discussion at 2am and when the question "then when does share alike kick in? " was posed.

Well the answer is: "when you modify OSM data". The simplest example: you improve the position of a POI by changing the coordinates or you add further information to the POI, then you have to make the resulting dataset available on ODbL terms. Don't forget we are always assuming that you are Publicly Using the data.

A more interesting example: assume you have a proprietary database containing road geometry and associated with that geometry, road surface information and further that you have permission to integrate the surface information into OSM. You add surface tags to the OSM roads in your copy of the OSM data: yes you have to publish the improved OSM data on ODbL terms.

The important thing to note is that it does not effect your original proprietary database, there is no infection or tainting of that dataset, you simply cannot keep the changes to the OSM data to yourself.

And what about the other way around? Assume you notice that OSM has some surface data that is better than that in your proprietary database and you replace the original information with that? Then the resulting dataset is subject to share alike and you need to make it available on ODbL terms.

To sum it up: When does share alike kick in? When you modify OSM data or apply modifications from OSM to third party data and use the results publicly.

I think that's a very clear explanation on how share-alike works for OSM in most cases.

One thing that always bothered me about share-alike in the ODbL are the severe restrictions on what i can charge someone for making available a derivative database to him (namely only costs of physical reproduction, none in case of internet distribution). OSM data has significant volume and combining it with other data can make it much larger. When you generate a produced work based on a high volume derivative database this is a serious problem. You either have to keep the data (at significant costs, no matter if anyone ever wants it or not) or you have to be prepared to re-create the derivative database at your own cost since you cannot charge for the work. The only practically feasible way in such cases seems to be to make available the algorithm used - which however you might not want to for some reason.

@imagico the LWG is acutely aware that there are cases in which providing the derivative database in full is a burden and doesn't actually make sense. It is likely that we will propose a guideline for discussion.

@cquest: On the subject of geocoding, perhaps Simon's last example will help. When you read the following quote, replace "surface data" with "geocoded lat/lon data".

And what about the other way around? Assume you notice that OSM has some surface data that is better than that in your proprietary database and you replace the original information with that? Then the resulting dataset is subject to share alike and you need to make it available on ODbL terms.

So we have the following cases (in all I assume "Public Use"):

You use OSM to geocode some data (e.g. convert an address into Lat/Lon) and then add this data into your own proprietary database. The resulting dataset a "derivative database" = Share Alike

You use OSM to geocode some data and you keep this independent of your proprietary database. The data extracted from OSM (list of geocoded addresses) is a "derivative database" (= Share Alike), but your proprietary database is untainted (=no need to apply share alike to your proprietary database).

You use OSM to geocode some data and you keep this independent of your proprietary database. You then assemble the two independent parts into a collective whole - for example (like the waste bin example) replacing addresses in a non-OSM database with the geocoded results. Now the resulting OSM data is clearly a Derivative Database and is subject to the share alike terms in the ODbL (as it was in point 2), but it does not change the status of the collective whole which can still have different licences for its individual parts and the whole.

The case I'm not so sure on is the one where you use OSM to geocode some data and then display markers on top of map tiles generated from a proprietary database. In my opinion the map tiles are not a database and as such this situation would fall under point 2 above. If however this combination constitutes a Collective Databae then point applies. In any case points 2 and 3 essentially mean the same thing (i.e. the list of geocoded addresses extracted from OSM is a "derivative database" = Share Alike).
You then assemble the two independent parts into a collective whole - for example. showing the markers on a non-OSM map, or (like the waste bin example) replacing addresses in a non-OSM database with the geocoded results.

Obviously attribution still applies in all cases. So if you are using geocoding to display markers on your slippy map (map tiles) then you need a statement saying that the geocoded data (map markers) is from OSM.

Simon, in my personal opinion this is a good summary of the ODbL and share-alike. Thanks! Lots of people get confused about this.

@RobJN, for the case that you're "not sure about": the map tiles are irrelevant here because in the example you describe, they never get combined with osm data (just overlaid). However, the data driving the "markers" (points from some other database, geocoded using OSM) seem to me to be a derivative database, so I'd suggest that the marker data would then be share-alike.

Isn't 4.6 the most relevant part for most commercial use cases of the ODbL?
As Arnulf Christl pointed out today (with the example of SplashMaps) at FOSSGIS 2014 Berlin

a. The entire Derivative Database;

or

b. A file containing all of the alterations made to the Database or the method of making the alterations to the Database (such as an algorithm), including any additional Contents, that make up all the differences between the Database and the Derivative Database.

The Derivative Database (under a.) or alteration file (under b.) must be available at no more than a reasonable production cost for physical distributions and free of charge if distributed over the internet*

There is no need to give the Public access to your precious private data as long as you describe the connector to your Data Source.

If I understood that correctly you don't even have to expose the underbelly of your original data structure. One could only describe the access to a pruned, generalized, rasterized or otherwise preprocessed proxy Database. The Database does not even have to be accessible online.

Also it's semantically unclear if giving access to any additional Contents on request by physical mail applies to the distribution method of the derived database or to that of the alteration file.

By no means you have to expose any original data. Also it's unclear to me if giving access to any additional Contents alters the the license of that content to ODbL (especially regarding reuse rights) or only allows a one time access.