4 thoughts on “Semantic Web and the practice of Statistical Agencies”

Well –what did we learn and why did we use resource descriptions at Statistics Denmark?

To tell the hole truth and nothing but the truth we at Statistics Denmark did not expect anything to happen. However we did follow a recommendation about best practices for internet communication prepared by the Danish Ministry of Science, Technology and Innovation (www.itst.dk).

Each year they stage a competition / review between governmental internet sites. In the review the sites are ranked according to a set of criteria and resource descriptions are one of these criteria. So the simple answer is that we use Dublin Core because it is recommended to us, and because it is important to our ranking in the yearly review of official Danish web-sites. Not because of it diret application for end users.

The original thinking behind the recommendation was that search engines should and would do a large part of their indexing from the Dublin Core meta-tags. But of cause now we know that Google works in a completely different way. And we know that Google is the only search engine used by visitors to http://www.dst.dk.

So in reality we could remove the Dublin Core tags without affecting our users.

From time to time we use the Dublin Core fields to insert synonyms aimed at our own robot based search engine. But at the moment we do not see our “tagging” as stepping stone to any kind of semantic web. It is my understanding that there are information specialists who does indeed use the information found in the Dublin Core tags.

There is demand to use standardized metadata description of all electronic documents in the Czech Republic in accordance with the act. No. 365/2000 Coll., with effect from the 1st January 2007. The Czech Statistical Office discharged this duty in March 2006. Ministry of Informatics prepared direction based on the Act, which contains obligatory description of metadata system – and this is based on Dublin Core.

Besides this statutory duty the CZSO efforts to observe recommendation, which ensures better data accessibility. Structured data is one of the possibilities how to offer to our users additional information, which could have relevance for them in many cases (for example guarantee of content or information recency).
We have expected, beside acquaintance determined by the act, that by using metadata (for example page description, key words) we could influence the possibility of finding web pages and support its better rating in searching. Metadata displayed on particular pages serves first of all for automatic processing by the “catalogue” – i.e. special file, which keeps metadata to each information source and enables transparent and structured data searching. “Catalogue” allows collecting metadata from different databases in different technologies etc.
Using of standardized metadata should be profitable for both sides. Users receive guarantee of data validity and recency, on the other hand the attendance and rating of institution’s web presentation should be higher. Structured data could be also used in communication with other state institutions. But to tell the truth, we cannot verify presently whether using of standardized metadata positively influenced rating of web pages of the CZSO and filled every requirements.

Standardized metadata are also used in internal searching on the CZSO web pages.
For optimising the search results, the CZSO uses the controlled vocabulary. It contains 74 keywords selected from searched words analysis. Each keyword has its own group of alternative words – synonyms and a limited number (5) of “selected links”. When the user enters a keyword or a synonym in the search engine, system displays a site with the searched word and “selected links”, which are most related with the searched word or theme.
Moreover the CZSO uses a List of thematic groups. Each page created is assigned to one or more thematic group. By entering a searched word, the user has a possibility to select a thematic group, where to search for the word.
Catalogue used on CZSO web pages for data searching accurately follows the Dublin Core structure, these data are subsequently provided to full text search engine. In the future there is a vision to implement these records to other public catalogues without any fundamental changes.
We also hope, that standardized metadata assist in interconnection of the Public database and the current web presentation.
A practical illustration of using the structured data is the National Statistical Portal. Its main objective is to assemble statistical information from all public institutions at one place and obligatory metadata description (in our case it is Dublin Core) plays here an essential informative role.
In this case, the catalogue should accumulate metadata focused on statistical information from all available information sources and then provide them for structured searching. In the future there is a possibility to create the Public Administration‘s Information Sources Catalogue. This Catalogue would contain a list of data gained from all subjects of Public Administration Sector.