XML, JSON or Binary : Does It Matter?

Of course it matters. But with a tiny amount of effort, it doesn't have to matter much.

This document describes a way of thinking about structured data and it's use in distributed applications. When implementing web-based APIs, developers traditionally use a well-supported message format such as JSON or XML. In applications where there is a perceived need for processing speed, custom binary formats are sometimes utilized. Each format has it's own benefits and drawbacks and this document will not argue their relative merits. It will, however, recommend an approach which abstracts away details of the specific message format (XML, JSON, binary) and focuses the application developer on more important aspects of application design.

We start by examining the benefits and limitations of each message format, then describe an XML DTD and sample tagged binary format which can be mechanically converted into other described formats. We then describe how these formats can be used with HTTP(S) and conclude with a simple software architecture that allows HTTP(S) clients to specify their message format preference.

Ultimately, we conclude that if you use the XML DTD described in this document, it is easy to mechanically convert between formats. Because HTTP(S) allows clients to specify their preferred message format for responses with the Accept: header, it truly doesn't matter which is used: XML, JSON or binary.

Why Use XML?

XML is a well-respected and well-understood specification for communicating structured data. It's parsing rules are unambiguous and a large ecosystem of tools has emerged to support applications that use it.

XML does come with drawbacks, however. The most notable "problem" with XML is that it is not self-describing at the semantic level. The 'X' in XML stands for 'eXtensible' and this flexibility comes with a cost: you must use a schema (or DTD) to describe the relationship of elements within an XML document so applications consuming it can validate the document.

XML Schemas (XSD, etc.) and Document Type Definitions (DTDs) work reasonably well in tightly-coupled applications where the document producer and consumer can easily agree on the document's schema. But in loosely coupled application environments, it is difficult (or impossible) for a document producer to know what specific schema a consumer is expecting.

This is not a difficult problem for most applications targeting browsers. Web applications are typically delivered as JavaScript programs, downloaded each time the end user visit's the application's URL. Because the client-side JavaScript application is downloaded from the server, the server's operator can ensure the server-side application components use the same schema for messages between the client and server.

But for non-browser targeted APIs exposed through HTTP(S), version skew between server and client remains an issue.

Users of XML can choose to use a less-strict standard for XML documents; application developers can specify that documents be "well formed" instead of "valid." This relieves developers of the responsibility of ensuring client and server agree on a specific schema, but also obviates many of XML's benefits.

Why Not Use JSON?

JSON (originally JavaScript Object Notation) was intended as a light-weight format for communicating structured data. Parsing JSON is notably easier than XML and includes "built-in" type semantics. Even though JSON specifically uses JavaScript's type semantics, users of other languages (PHP, C#, Java, etc.) have found it easy enough to use. It turns out that a 31 bit integer is a 31 bit integer regardless of what programming language you use.

Proponents of JSON point to its "schema-less" notation as a benefit over XML in loosely coupled systems. XML adherents note that JSON's relative paucity of types obviates rich representation of semantic intent. But for application developers building loosely coupled systems, you're probably only requiring documents to be "well formed" instead of "valid," so you've probably already lost much of XML's expressive capability.

JSON is also "self-describing" -- you don't need additional information to convert a JSON document into a JavaScript object (or a C data structure or a Java map or ...)

And Here's An Important Point

In loosely coupled systems, the consumer of a message may need to accept a later version of a message and attempt to reason about it. JSON's schema-less / self-describing nature allows the "message parsing layer" to defer reasoning about semantic intent to the application layer. You can do the same thing with XML, but only if you use a DTD that won't change between message versions.

This "N+1 Ability" is crucial to deploying systems that do not break when version skew is introduced.

Specifically, if a validating XML parser encounters a "version N+1 message" that uses a element unknown to it's own "version N" schema, it will stop parsing and raise an exception. This is probably the correct behaviour in a tightly-coupled system where message producers and consumers are in close agreement. But it is almost never the correct behaviour for loosely coupled systems where participants may not coordinate which version of a schema is being used.

Making XML "Self-Describing"

As mentioned above, it is difficult to use XML's full capabilities in a loosely-coupled system. But you can certainly use a subset of XML features and retain several of it's benefits. To make XML suitable for loosely-coupled systems, you should either:

Relax the requirements for XML documents to be "valid" and instead require them to be "well-formed."
Or, use a DTD such as Apple's P-List or VWRAP LLSD/XML whose DTDs describe the type of elements and not it's semantic intent.

An example message using VWRAP XML serialization follows:


   <?xml version="1.0" encoding="UTF-8"?>
   <llsd>
    <array>
     <integer>42</integer>
     <uuid>6bad258e-06f0-4a87-a659-493117c9c162</uuid>
     <map>
      <key>hot</key>
      <string>cold</string>
      <key>higgs_boson_rest_mass</key>
      <undef/>
      <key>info_page</key>
      <uri>https://example.org/6bad258e-06f0</uri>
      <key>status_report_due_by</key>
      <date>2008-10-13T19:00.00Z</date>
     </map>
    </array>
   </llsd>

Key take-aways from this example include:

XML tags describe the type of the data they enclose.
The structure of the document is preserved.
XML tags do not describe semantic intent. Instead, semantic intent is assumed based on position in an array or by the key-value preceeding the data tag.

Another useful characteristic of P-Lists or VWRAP LLSD documents is they can be automatically converted into JSON. Here is the same example document serialized as a JSON message:


[
  42,
  "6bad258e-06f0-4a87-a659-493117c9c162",
  {
    "hot": "cold",
    "higgs_boson_rest_mass": null,
    "info_page": "https://example.org/6bad258e-06f0",
    "status_report_due_by": "2008-10-13T19:00.00Z"
  }
]

Careful examination will reveal that it is easy to convert the first format into the second. Converting the second into the first is slightly more difficult. In the VWRAP environment, this problem was solved using an abstract type system and interface description language which is outside the scope of this document. A simple solution would be to mirror JSON types in an XML DTD like this:


  <!ELEMENT dsd (undef|boolean|number|string|array|map)*>
  <!ELEMENT undef EMPTY>
  <!ELEMENT boolean (#PCDATA)>
  <!ELEMENT number (#PCDATA)>
  <!ELEMENT string (#PCDATA)>
  <!ELEMENT array (undef|boolean|number|string|array|map)*>
  <!ELEMENT map (key,(undef|boolean|number|string|array|map))*>
  <!ELEMENT key (#PCDATA)>
  <!ATTLIST string xml:space (default|preserve) 'preserve'>

This DTD has the advantage that it specifies XML documents which can be mechanically converted into JSON and back again with no loss of expressibility. The disadvantage is that it uses none of XML's capacity to carry semantic intent.

Should a Message Parser Care About Semantic Intent?

And this is the crux of the matter. Should a message parser care about the richness of semantic intent (like XML seems to do?) or should it defer to the application layer logic to worry about the meaning of elements in a protocol message?

The authors of this paper believe that message parsers in loosely coupled systems, the part of an application which deserializes protocol messages into internal formats used directly by application software, should not attempt to interpret the semantic intent of protocol messages.

Tightly coupled systems, on the other hand, may find it easier to use messaging systems which throw exceptions when they receive semantically nonsensical messages. In environments where one organization controls the producer and consumer of a message, it is much easier to dictate which protocol versions are used. The situation is akin to the assert() function in C-programming. Using it will cause a program to exit; but if you really don't know what to do with malformed input from a user, this may be the best alternative.

If your messaging middleware sends an exception when it receives a message whose semantics it can't understand, that eliminates the application layer from reasoning about the intent of the message. Depending on your application, that may be the best course of action.

What About Binary Message Formats?

We mentioned binary formats in the sections above. When should they be used? The authors of this document believe the answer is "more or less never." However, we concede there are situations where binary protocol data units are justified. But in modern systems, the time required to parse a message from a human-readable text format into a machine-readable binary format is inconsequential for all but the most extreme cases.

Human readable text formats offer significant benefits for debugging.

If you feel you must use a binary format, please consider using a self-describing binary format so debugging probes can deconstruct protocol messages without resorting to an external dictionary. An example of such a system can be found in the Binary Serialization section of the VWRAP Abstract Type System draft referenced at the end of this document.

If you do use a binary format, it is important to think about why. Two primary reasons are:

Eliminate serialization and de-serialization overhead when processing messages
Reduce the size of protocol messages

Both are laudable goals, but with modern systems and networks, may be pre-mature optimizations. Serializing and deserializing JSON is less compute-intensive than ASN.1/BER or XML parsing. In loosely coupled systems, you can't assume you know details of the computer architecture of the system receiving a message. Senders cannot simply copy an in-memory data structure into a message and assume the receiver will be able to efficiently consume it. Endianness, word size and memory alignment restrictions may make it difficult for a receiving system to simply copy a data structure from a protocol message into working memory and use it directly. There are also significant security issues with this approach. Fortunately, there are few developers who would approve of such a system.

Common binary formats such as MessagePack, ProtoBufs, Avro and Thrift TBinaryProtocol avoid serious security and interoperability issues, but at the cost of processing time. All binary formats have the disadvantage that they are often harder to reason about when debugging; which is easier for a human to comprehend, a JSON or XML message or a compressed binary?

But there will always be situations where binary message serialization should be considered, so the advice given in this paper is: in loosely coupled systems, use binary serialization formats that are self-describing (i.e. - that do not require an external resource to decode the sender's semantic intent.)

The authors of this paper also suggest you use a binary format that can be mechanically translated into JSON or XML.

On the Advantages of HTTP(S)

HTTP has been an overwhelming success over the previous twenty-five years. Other protocols (such as BEEP, IIOP, AMQP or XMPP) certainly have advantages over HTTP for specific use cases, but HTTP's simplicity and the ubiquity of tools which relieve the application developer from "boring" housekeeping tasks make it "first among equals" for network applications. In short, if you are building a distributed application, consider using HTTP first and then revert to some other mechanism when you find HTTP does not meet your needs.

A few of HTTP's advantages:

Ubiquity: It is hard to find a piece of network equipment that does not understand HTTP.
Text Based: It is *much* easier to decode the header of a HTTP message than an IIOP message.
Caching: For applications that would benefit from content caching, HTTP has well-defined semantics for caching information.
Proxying: HTTP has well-defined semantics for proxying requests through proxies and application firewalls.
Flexibility: With Accept: and Content-Type: headers, you can tunnel just about anything over HTTP.

HTTP messages are most commonly carried over TCP/IP, but it is also possible to use HTTP with any reasonable transport, including: UNIX domain sockets, FIFO / Pipes, and serial links. Any transport that supports error free, bi-directional communication can (in theory) be used.

Using HTTP Content Negotiation (i.e. - Accept: Headers)

The HTTP specification describes several headers which enable "Content Negotiation." For our purpose, we only consider the Accept: and Content-Type: headers. If we limit responses to API requests to formats which can be mechanically converted between each other, it doesn't matter which format is used since the client can consume the message directly or convert it into a format it can consume. If the client does have a preference, it can communicate this preference by adding an Accept: header to the request.

Accept: headers include a list of acceptable media types a client is willing to consume. According to the HTTP specifications, the server picks a specific media type for the response and communicates its choice to the client using the Content-Type: header.

For our purposes, if the client did not have a preference (JSON, XML or Binary) it could omit the Accept: header. If it did, it would include a header listing the acceptable format. For example:


   Accept: application/dsd+json

In response, the server should always specify the serialization format used with the Content-Type: header, like so:


   Content-Type: application/dsd+json

The authors have used the following media types to describe the specific transfer syntax used in HTTP messages:


  application/dsd+text    - "Dynamic Structured Data (DSD)" text format
  application/dsd+json    - JSON
  application/dsd+xml     - XML using the DSD DTD (described above)
  application/dsd+binary  - A binary format related to the DSD text format

Conclusion

This document has tried to demonstrate it is possible to:

Mechanically convert structured data between JSON, XML and Binary formats
Communicate with HTTP(S) clients using a message serialization of the client's choice

The benefits for loosely coupled systems include:

No code change required when a new client wants to use a new serialization format
Moves reasoning of semantic intent of message elements into the application layer
Provides (some) defense against the "version N+1 message producer" problem

It does come with some costs, including:

The richness of XML expressability is largely lost
Self describing message formats are often larger than comparable non-self-describing formats.

It is the experience of the authors, however, that for loosely coupled systems, the benefits of the "transfer syntax neutral" approach described here outweigh the costs and encourage system developers to consider using it.

References

Apple P-List DTD

VWRAP : Abstract Type System for the Transmission of Dynamic Structured Data

HTTP Content Negotiation

Dynamic Structured Data