It seems like the anti-XML cranks are cranking it up again. After I got through dueling with a factually-challenged low-UID poster on Slashdot, now I see 8 Reasons why XML Sucks pop up in an aggregator.

A few points in their roster actually seem valid. Most are weak or flat-out invalid, though, so let’s take a peek.

First up, they bash “Character Encoding”, referring to the fact that you need to specify an encoding for the text in an XML file, and it’s entirely possible to declare encoding X and actually use encoding Y, which screws up parsing. They also bash having to entity-escape things like < and &, and people over-using CDATA sections.

Unfortunately for them, this section does a fine job of illustrating the problems I have with their diatribe. They mix-and-match multiple types of problems and blame it all on XML, when that’s not really accurate. In this case:

  • Encoding and entity-escaping problems are only an issue with hand-writing XML — any XML generator worth its salt will take care of it for you programmatically if you are generating XML in an application. Moreover, any non-textual data format CAN’T BE HAND-WRITTEN AT ALL, at least not in any reasonable fashion (ever try to hand-encode an MPEG frame?). XML is imperfect for hand-written data, but it is light-years better than hand-written data in a binary format.
  • The CDATA bash needs to be aimed at those who overuse it, not at XML itself. This is a popular theme in XML bashing: blame the format for bad users. It’d be like my blaming a blogging software vendor because their article has issues.

Next up, they bash “Embedding Binary Data”, which is indisputable. XML is not geared for that situation and should not be used when that is a requirement.

Next, “Indentation Characters Mixed With Data”, which goes back to the hand-written XML “problem”. Yes, if you are hand-writing or hand-inspecting XML, dealing with indentation can be a hassle, but it beats the hell out of using a hex editor to hand-write or hand-inspect binary data formats. And, if you are generating or consuming XML programmatically, it’s not an issue.

Next, “Textual Representation of Numeric Data” is a case of “you say tom-ay-toe, I say to-mah-toe”. They don’t like the fact that numbers take up more space than they do in binary. I don’t like the fact that binary floating-point representations are intrinsically inaccurate. Both are problems. Neither are big deals, except when they aren’t (extremely low-bandwidth situations in their case, currency calculations in mine).

Next, “Unnecessarily Verbose” is another long-standing gripe tossed out by XML bashers. To some extent, I agree, and therefore prefer something like JSON in simple cases. However, there are very very few situations where the verbosity will be an issue, mostly for really big data needs (which is why Google created Protocol Buffers) and really low bandwidth scenarios. For your garden-variety developer, or blogger, or whatever, the extra characters have no real negative impact.

Next, “Human un-readability” is a re-hash of their earlier complaints. Again, while XML is not perfectly readable, it’s way better than trying to make sense of bit soup. When developers are stuck embedding nonsense text like “green eggs and ham” in their binary structures, just to be able to pick out key sub-structures when staring at a hex editor screen, XML looks easy by comparison.

Next, “A single XML root element” is a somewhat valid complaint. Historically, XML had a single root element because SGML did, IIRC. There are certainly cases where having multiple roots can be handy, as they outline, and some parsers even support it.

Finally, “SOAP – Oh My God” would be a valid complaint if that’s actually what they complained about. Heaven knows I’m no fan of the SOAP structure. However, they claim:

Once you start parsing the XML document you parse it all!

which is only true if you’re some programming n00b. There are many types of XML parsers, only some of which (e.g., DOM) always parse the whole document. Others (SAX, XML pull parsers) don’t, and in scenarios like what they describe, you’d want to use one of those. Blaming XML because programmers use the wrong tools is just plain goofy.

Don’t get me wrong: XML gets misused all the time. It gets applied to circumstances where it is unsuitable, people misuse features like CDATA, and so on. And if somebody wants to suggest a replacement format that offers most of XML’s features (e.g., namespaces, schemas) and cures some of its ills (e.g., verbosity), I’m all ears. But if you’re going to take the time to write a several-page blog post bashing XML, at least do your readers a service and do a decent job at it.