WEBINAR:

On-Demand

The Advantages of XML Over CSV

Question: What is the advantage of XML over say, a comma delimited file? If you're using it for data transfers between platforms, I would think that always having to explicitly tag each data element adds unnecessary overhead to your data file. Trust me, I want to be wrong hereI know that there has to be more to it than that.

Answer:

There are a number of major problems with a comma delimited file (or CSV, comma separated values, which is the standard terms for such documents) as a data tool. A CSV is a flat table of data. This is fine if your data is itself flat, but most programmers have generally found that, despite their best intentions, data has a tendency to organize itself. A relational database, after all, consists of numerous flat tables with relational links to other flat tables, and the techniques to make those relations can become hideously complicated. An XML file, on the other hand, is intrinsically hierarchicalobject-oriented, if you will, although you can still build relational information into it for specifying more complex relationships.

It's embarrassingly easy to corrupt your data with a CSV. The data itself cannot contain a comma (which makes CSVs useless for many string applications), and if even a single column is missing you may find that your last name has become "442 W Gibralter Ave.". An XML document has various levels of validation and parsing protection, so an invalid data entry becomes obvious far sooner.

You have to build the parsing and containing structure for a CSV document, while you get that for free with the XML DOM. What this means is that with the DOM you can easily retrieve the specific value of a field and manipulate it, while you have to go through some serious hoops to work with the same value in a CSV. Also, you can easily query an XML structure with one or two lines of code. To do the same for a CSV would take considerably more, especially if you wanted the queries to be flexible.

You can assign attributes to a given XML element beyond simply the data value of that element. Thus with XML, you could create an element that not only contained a URL, but also dimensions, locations, load instructions, and anything else that you may feel is pertinent to the image object. Persisting a CSV is as complicated as reading one in, and subject to the same level of error introduction. An XML document maintains its state internally, so can protect against invalid data even when persisted.

Unless you know the headers, a CSV file is meaningless, and even with headers you're basically guessing. The hierarchical nature of an XML document imparts a greater degree of context to the information, so that you know that a street is part of an address, which is part of a user's record, which is part of a user base.

An XML file is on average three times as large as a CSV file, assuming that the XML file has the same headers and data as the CSV filethat is, a 4K file becomes a 12K file. At 1200 bps, this discrepancy could be significant, but at 56K bps this means that transfer times jump from 1 to 3 seconds, and transfer times across Ethernet or even a DSL become insignificant. Thus the "inefficiency" of sending XML versus CVS is small enough as to not make much difference, especially given the advantages that XML offers.

In order to transform the data into some other form, a CSV file requires explicit programming. An XML file can use a special filter language called XSL (Extensible Stylesheet Language) to modify itself, and that filter can be swapped out for other filters with a single line of code.

XML is like HTML in that it is a universal language (although it differs from HTML in the fact that while HTML is a very well defined dialect, XML is actually a generic meta-language). It's becoming commonly accepted (just today, the United Nations and the OASIS trade-group agreed to establish a formal working group for trade-based XML standards. This will drive e-commerce into the next millenium. A CSV file is a CSV filemeaningless except to the person using it.

XML supports Unicode implicitly, which means that it can be used for any language. CSV is most likely 8 bit ASCII, and is thus incapable of handling anything other than English.

In other words, you can think of XML as the persistent form of a data object, with an incredible amount of functionality, a universal format, a high degree of legibility, and an intrinsic extensibility. A CSV (or any other flat file, for that matter) would be unable to do this.