Following a discussion on the mailing list earlier, a directory hierarchy has been set up, and each files contains an index.xml to describe the content. In case of a directory with actual test files, it may look like:

To improve and ensure some quality, the XML must be valid in addition to just well-formed, so that I can set up XSLT stylesheets to create XHTML indices and summaries. Therefore, I wanted to setup a schema for the index.xml files. My first thought was to use XML Schema which has XML Namespaces support and has well defined (and extensible) data types. I have hacked in it in the past my the details have slipped me. Already in 1998 I worked with DTDs, around the time that the XML specification was declared a recommendation. Originating from the SGML year, it is not XML based, had no knowledge of namespaces, and only a limited amount of data types.

Then there is RELAX NG. XML based, uses the same data types are XML Schema and has support for namespaces. Since I had to look up the specs for either DTD or XML Schema for the details anyway (e.g. on how to allow the DC namespace in the main namepsace), why not try something new. Well, I was amazed. RELAX NG has a syntax simplicity like that of DTD, but the functionality from XML Schema. So, I hacked up in 30 minutes a XML spec for the test file repository, including a (too short) list of recognized MIME types. Just a combination of some <element>, <attribute>, <oneOrMore>, etc elements. The results is available as schema.relaxng in SVN.

Search This Blog

This blog deals with chemblaics in the broader sense. Chemblaics (pronounced chem-bla-ics) is the science that uses computers to solve problems in chemistry, biochemistry and related fields. The big difference between chemblaics and areas such as chem(o)?informatics, chemometrics, computational chemistry, etc, is that chemblaics only uses open source software, open data, and open standards, making experimental results reproducible and validatable. And this is a big difference!

About Me

Assistant professor at the Dept of Bioinformatics - BiGCaT at NUTRIM, Maastricht University, studying biology at an unsupervised and atomic level. Open Science is my main hobby resulting in participation in, among many others, Bioclipse, CDK and WikiPathways. ORCID:0000-0001-7542-0286. Posts on G+ are personal.

Cookies

In the EU there is a directive upcoming requiring websites to warn people about HTTP cookies. This website uses the Blogger.com platform, Google Adsense (not that is it actually paying anything significantly), and a few scripts to count how often a blog post was tweeted, using Topsy and LinkedIn. These services undoubtedly make use of cookies, which you can disallow in your browser.