Slashdot videos: Now with more Slashdot!

View

Discuss

Share

We've improved Slashdot's video section; now you can view our video interviews, product close-ups and site visits with all the usual Slashdot options to comment, share, etc. No more walled garden! It's a work in progress -- we hope you'll check it out (Learn more about the recent updates).

jimboh2k writes "The Australian Bureau of Statistics will use the 2011 Census of Population and Housing as a dry run for XML-based open source standards DDI and SDMX in a bid to make for easier machine-to-machine data, allowing users to better search for and access census datasets. The census will become the first time the open standards are used by an Australian Federal Government agency."

I'm perplexed why people continue to use XML when there is YAML. What is it that makes XML so attractive as a durable format? it's not human readable in a practicale sense, and YAML very much is. Since it's delimeters are comlicated and variable, It's harder to parse in ad hoc ways than yaml (line and white space) which means that for rapidly extracting things there are no shorcuts to instantiating a whole document. It's hard to grep. And both formats can fully do the other ones job so they are interchangeable.

To see how clean YAML is to reads for humans and to parse by machine look at a Sample Document [wikipedia.org]. And here's something truly impressive, a Yaml Quick reference card [yaml.org] written entirely in YAML itself. Not only is it a marvelously short card, it's human and machine readable. It's a superset of JSON too.

Interesting. How does YAML handle validation and user defined grammars?

Multiple ways of varing stringency. For the simple case you can define types (.e.g. floats, ints, or user defined types). For the vast majority of uses that's all you need for validation. Now if you want to define a schema there are several different ones that are used. Kwalify and Rx are two. Finally, there are YAML 2 XML converters. So you can just convert the YAML to XML and use your favorite XML validator. Thus the validation itself other than the types is not baked into the definition and thus

Great for human readability. Terrible (due to some python-like indent rules) for humans to add content to.

Meanwhile, XML might not be quite as nice as YAML for reading, but it is easier to figure out where you made a mistake, assuming you're pretty printing it (but the best thing is that pretty printing it is unnecessary).

Great for human readability. Terrible (due to some python-like indent rules) for humans to add content to.

Oh come on man. This is like the ancient discarded whitespace lament about python. I was once like you before I started writing python. Then I saw the huge huge light of why white space indenting is so great. I could explain but I'm not sure I could have convinced even myself before trying it.

Bottom line. it's freakin easy to get the white space right and any decent editor with context sensitive tabs does it for you. emacs, vim, bbedit, eclipse. Is there any that don't?

This is a NON ISSUE

Meanwhile, XML might not be quite as nice as YAML for reading, but it is easier to figure out where you made a mistake, assuming you're pretty printing it (but the best thing is that pretty printing it is unnecessary).

Ha! you make me laugh. So now we need special editors and printers for XML reading. Were we not just complaining about white space. Now you pretty print to put perfect white space in XML?

I'm with you on the python whitespace thing, but for YAML it's different. We're not talking about writing code here. It can be tricky to get the whitespace right but it's a damn sight easier than learning and reading XML syntax. Remember that 99% of the time machines process these files and we only care to make reading easy (where YAML whitespace is a non-issue) and human editing easy, where it isn't too bad. Composing from scratch by hand isn't really something you're going to be doing with YAML (or XML).

Great for human readability. Terrible (due to some python-like indent rules) for humans to add content to.

Apparently you are not aware that YAML, being a superset of JSON, can be written entirely in JSON, or a mixof the two. in JSON you don't need to use white space. So you use the white space in YAML when it makes sense (nearly always) and when you get into absurd edge cases then you toss in a little JSON syntax when apropos.

So sorry, you just don't have a case to make here unless you want to say something bad about JSON as well.

I use JSON (and occasionally YAML), but only for data interchange formats where I don't expect a human to need to modify it.

Yes, I am aware that JSON and YAML are largely related. And I a few times tried to write up files in JSON, just as a mockup of my intended data structure. Yes, I used a real editor with proper tab indenting. It still got to be pretty unreadable. I use Data::Dumper whenever I want the data format to be as explict as possible, but only for debugging.

I'm perplexed why people continue to use XML when there is YAML. What is it that makes XML so attractive as a durable format? it's not human readable in a practicale sense, and YAML very much is. Since it's delimeters are comlicated and variable, It's harder to parse in ad hoc ways than yaml (line and white space) which means that for rapidly extracting things there are no shorcuts to instantiating a whole document. It's hard to grep. And both formats can fully do the other ones job so they are interchangeable.

I would actually dispute all of your comments, but picking up on the last point in bold, one of XML's key features is "mixed content [w3schools.com]", which is apparently (according to http://yaml.org/xml.html [yaml.org]) not possible in YAML.

Can you point to me, please, to the reference on how one can define in YAML the equivalent of a schema?You know, to act as the "contract" for the data exchange protocol... extensions (to allow 3rd party custom data sections) and namespaces (to isolate the 3rd party extensions that I'm not interested in) would be a real bonus.

The real answer is: who cares? They're both easy [enough] to parse data formats. It's about as interesting as arguing about what your favorite editor is and why. Or your favorite database. Everyone knows the ins and outs, and nobody cares (except maybe you and the person you're arguing with). We all have libraries. We all have parsers. It really doesn't matter.

The trivial answer to your question is: because YAML is very new in the grand scheme of things. And it's not so different that it's really in

* Immediately restore the long-form census;
* Make as many government datasets as possible available to the public online free of
charge at opendata.gc.ca in an open and searchable format, starting with Statistics
Canada data, including data from the long-form census;

This is why an Australian invented Wikileaks... I mean... "information wants to be free" and such...

and open source you share your code freely to help everyone

Hey, where does it say that they'll share the code?
TFA quote:

with the ABS directing software developer Space-Time Research to utilise the standards for both input and output of all data collected next year.

So:
1. it is the data that will be shared (govt takes preemtive - still legal - actions against Wikileaks?;) )
2. the guys that are doing the software is Space Time Reseach [spacetimeresearch.com] - the way I know, a bit far from a open source establisment (note: I have no affiliation with them)

Meanwhile, in other government agencies and private enterprise there are open file formats such as the geophyical SEGD and SEGY formats that have been used since at least the 1980s. That means you can read data files from 1982 on current software.Closed file formats are an "innovation" of Microsoft and similar companies. It's really any different from the bastards that write unreadable code in an attempt to provide job security.hopefully in the future some of the practices of elements of Microsoft and man

Last I heard it was a migration to open source and they were successfully using open source desktop applications. The operating system may be Windows rather than Linux but this still seems to be a victory for open source. On the desktop the applications are far more important than the operating system.

TFA mentions "open standards" in the opening and only once. I reckon the reporter (or the proof-readers? or editor?) had a slip-of-fingers on the keyboard. 'Tis clear they speak of Open Standards rather.

The census will become the first time the open standards are used by an Australian Federal Government agency.

What the hell are you talking about? We use a variety open standards every day of every minute across every department with any modern IT assets, I think what you meant to say was the first time that open standards are being used by an Australian Federal Government agency to communicate with the general public. Even then, it's not exactly news, it was going to happen eventually.