Samples for taxonomy_xml

Distributed with the taxonomy_xml module is a collection of
starter vocabularies intended to both illustrate the various
formats, and provide a few useful topic sets.

The content of each of the demo vocabularies was the
responsibility of the original publishers at the time it was
imported. All imports were done in a semi-automated manner
with no editorial input. I am not responsible for errors of
fact or spelling.
Structural problems, Character encoding problems and the
occasional ommissionare probably my fault.
Caveat Lector
Credit is given here to the institutions that made this data
available. All data redistributed here has carefully been
selected as being free for copyright-free transformative
re-use.
In some cases, tools or instructions will also be
provided for you to import your own versions of vocabulary
libraries for reasons of either scale, timeliness or
copyright. In cases of copyright you should read and
understand the terms of use of those respective data sources.
Usually it's "free for personal use but not redistribution"
and the taxonomy_xml module can enable that use.

Dewey Decimal System

Subject area: Publishing, General Interest.

Taxonomy Format: CSV.

Although the ownership on the Dewey Decimal system is
claimed by OCLC - Online
Computer Library Center they don't actually provide any
list (or offer access to a list) as a machine-readable
download, so I was unable to use them as a source.
Instead I found a public library
website that provided the Dewey lists into the Public
Domain. (Since gone away)

As samples, the taxonomy_xml module contains both a
100-term and 1000-term* version of the Dewey classification
scheme, with the implied decimal heirarchy and the 'Dewey
Number' supplied as a synonym.
As the Dewey system is extremely simple, it is provided as
an example of the CSV format.

Geography & history (900)
+ History of ancient world (930)
+ + History of ancient world China (931)
+ + History of ancient world Egypt (932)
+ + History of ancient world Europe north & west of Italy (936)
+ + History of ancient world Greece (938)

* There's not really 1000 terms in use at that level.
There are however many more subsections on a truly decimal
breakdown in some areas (not included).

This data was imported by way of an XSL transformation from
an XML file
topicset.iptc-subjectcode.xml taken from the site in
2007. The IPTC also maintains several other useful
vocabularies on their (hard to bookmark) Resource
page. Visit them for more.

Services of New Zealand (SONZ) Suggested Vocabulary

Subject area: Government.

Taxonomy Format: CSV/Service.

The E-government
Initiative from the New Zealand government has produced
the
NZGLS thesauri - including a list of 2364 keyword-type
ratified terms to be used when classifying government
services or interest areas. It is only lightly
hierarchical, and exists mainly as a synonym collapser and
list of 'preferred' consistent terminology.

It contains many 'related terms' as well as several weaker
synonyms for many terms.

This data is currently being retrieved directly from the
e.govt.nz website as a demonstration of the simplest
kind of web service the taxonomy_xml module supports. The
original file is provided as a CSV which is retrieved
directly from the URL when the taxonomy_xml admin selects
[Web Service][SONZ] as an import source.

This dataset is in fact the first test case, and the reason
I started developing syntax readers for Drupal Taxonomies

Google Merchant "Product Type" taxonomy

Subject area: Commerce.

Taxonomy Format: CSV-ancestry.

The distributed version contains only the top two levels
(200 terms). The full thing - which you can download,
convert to CSV and import yourself - can go to 5 levels
deep and contain close to 4000 terms.

This is an alternate CSV format, taking each term on a new
line with its ancestors repeated in each previous column.