Files and Scripts Used in Managing W3C Translations

This document describes the various files used to store all W3C related
translations in one place, as well as the scripts to generate different
"views" of the same data. Beyond the importance and interest of its own, it
should be noted that managing translations at W3C may be considered as a
modest showcase for the usage of various W3C technologies. Data are stored in
several RDF files, some of them
specifically maintained for the translations, some of them of a more general
interest. The fact that RDF based information originating from different
sources can be combined easily in one project shows the value of the Semantic Web approach. Also, all the
generated files are based on Unicode and follow all the guidelines of the W3C Internationalization
Activity.

(If you are a W3C Team member, and you want to know how to update the
information on the W3C site, please consult the additional information file.)

The primary RDF file containing the translations is: RDFData/Trans2005.rdf
(and will be Trans2006.rdf, Trans2007.rdf, etc).
The file contains two types of information: the translation data themselves
(they are grouped by languages), preceded by some information on the
languages proper. To facilitate editing this file the translators' data is
factored out into a separate RDFData/translators.rdf file.

For each translation, there is a reference to the original document (using
the property trans:translationFrom. The resource referred to by
this property is, in fact, the dated URI of the document but, most
importantly, it is the same resource as used in another RDF file on technical
reports at W3C: tr.rdf.

The caveat with tr.rdf, however, is that it does not include
all documents that are frequently translated (eg, W3C in 7 points,
WAI Quick Tips), nor does it include information like domain, or shorter
titles. To add this missing information, a third RDF file is available:
recs.rdf. This RDF file contains
some additional data for all Recommendations, plus some entries for
documents like the ones cited above.

The rest of the translation data are fairly straightforward, so are the
language descriptions; they do not really need further explanation.

Contact list of the most important translators. Each person, who
would appear at least twice as a translator, has his/her contact
stored here to reduce space and possible errors in updates. Note that
for persons with non-latin scripts in their names (Arabic, Chinese,
etc), a latinized version is also stored, if the data is known, to
make a more accessible display.

Overview of all translations, ordered by domains and documents.
Each document can be addressed directly using the document ID as used
in the W3C TR pages as a fragment identifier (ie,
its "short name", essentially the last part of its undated URI).

The set of Python scripts also have a set of (public) CGI entry points.
Both the language and the technology view can be inquired, and both can be
done by requesting various "views". Here are all the six possibilities (all
examples are with French translations and CSS1 as a technology):

Inquire the translations of CSS1, return a full XHTML page. The
identifier of the technology is the document ID used in the W3C TR pages as a fragment identifier (you can also
look at the menus on the translations' home page for the
exact codes). See also note below.

Inquire all French translations, return an XML encoded RDF with the
relevant information.

W3C Technologies are sometimes published as a
collection of recommendations, rather than one document (for example, XML
Schemas is published as Parts 0, 1, and 2) . The separate RDF depository describes those, with
resources identified by rdf:ID. In the calls above the
technology identifier can also use these rdf:ID values, and the
translations will be listed for all constituents.

Some languages have local "versions":
fr-ca for Canadian French, pt-br for Brasilian
Portuguese, etc. These codes can be inquired directly if one is interested
in, say, Brasilian Portuguese translations only. If the "main" language code
is used, all local versions are displayed as well (if there are translations
marked as such).

The scripts return the information in Unicode, more specifically
in UTF-8. Ie, if you plan to include the output into your own XHTML
pages, for example, your server should set UTF-8 encoding in the HTTP
response header.

The only caveat with this tool is that it has to be used with “real”
documents, ie, the tool cannot be used with the groups of documents described
in the previous section. Nor can the comma tool return anything else than
fully formatted HTML pages.