Technical Article

Honey? Does this format make my data look fat?

CNet is reporting that Googleis ditchingXML for a faster, more compact alternative known as ProtocolBuffers. I'm going to type this post really fast before Don finds out and starts laughing at me because he's always had this thing against XML, claiming it was too bloated and slow.

Google thought of using XML as a lingua franca to send messages between its different servers. But XML can be complicated to work with and, more significantly, creates large files that can slow application performance.

I disagree with the statement that it is XML that creates large files. No, no it's not. It's people that create large files in a data format, and that can happen regardless of whether it's binary or not. If you've ever worked in digital cartography or drafting, then you know what I'm talking about. AutoCAD files are huge, and they're binary. It's the application and the people designing the application combined with the amount of data that's being stored or transferred that determines whether a file will end up large or small. While binary is almost always more compact and more efficient than XML, it isn't always the case, nor is it inevitable that XML files will end up large and bloated.

Bad code can be just as inefficient and slow and bloated as inefficient use of a data format. I'm not saying Google's engineers have written bad code or that they are going to write bad code. In fact they probably won't given their track record. But blaming poor performance on a data format is like blaming poor car performance on the car's frame. There's just too many other factors that go into application performance to single out a data format. Network conditions, server load, server platform, coding techniques, etc... can all impact the performance of an application positively and negatively.

While it's certainly likely that Google will see an improvement in performance by moving to its new data exchange format, it's going to be losing at the same time. It's losing the simple integration and interoperability that comes from a standards-based technology like XML. We've been moving away from EAI-like technology that requires coding and development to integrate applications since the advent of SOA, so it's surprising to see such a services-oriented organization like Google move back into the dark ages of integration with this decision. XML became the lingua-franca of integration because it's much easier to integrate into a meta-data driven architecture, which is really one of the foundational pillars of Web 2.0 and SOA.

I will admit that ProtocolBuffers are intriguing and that given the performance needs of an organization like Google it very well may be necessary for it to move away from XML due at least in part to the performance of modern parsers to something more processor efficient, which certainly sounds like ProtocolBuffers. But it's the rare organization that needs that kind of speed and, for the most part, XML will continue to suit the majority of folks just fine.

You're wrong, and you know it. XML is fat by definition - tags are overhead, numbers-as-text are always at least as large as the source number (and then only if the number is single digit), XML is the ultimate personification of bloatware.

People can make inefficient communications protocols in binary, but you can't make XML efficient. If Google needs the performance boost and they've found a way to do it, all the more power to 'em. The fact that most companies don't need the performance boost has no bearing on Google's stance - they DO need it.

@Alex @Wes Thanks for clarifying. The post is not clear on that, it implies that Google tried XML and dismissed it for being bloated and slow. They may have tested it out, that isn't clear and it isn't often that any organization publicly discusses what they've tried and ditched.

@Don OH RLY? Perhaps you'd enjoy sleeping on the couch tonight? :-) XML isn't any fatter than JSON, or a ton of other text-based formats. What you're really arguing against is text-based data formats, not necessarily just XML.

It's a trade-off, as always. You trade interoperability and standards (XML) for speed and compactness (binary). Sometimes the former is more important than the latter, else we wouldn't be seeing the mainstream adoption of SOA and SOAP and XML in general.

That's not the only advantage of XML. Being meta-data driven means that it's a lot easier to integrate. Sure, you can get interoperability with PACS, but it requires coding which necessarily increases the development life cycle.

You also gain agility through XML-based standards like SOAP and WSDL, because the interface is separated from the implementation and modifications to the underlying code don't necessarily require client recompilation. That's not true when you're changing around interfaces with a standard that's coded.

Lori

0

About DevCentral

We are a community of 300,000+ technical peers who solve problems together.