Using XML As File Format

In my last post, I talked about some persistence issues we are working through and how persisting to XML could make things easier for us. In this post, I want to talk about using XML as a file format. Assuming we switch to an XML based file format for our products, there are questions I have regarding the proper use of XML.

Looking around, I can see at least 2 different categories:

  1. XML file is a dump of your in-memory state at the time of saving.
  2. XML file is a system unto itself and should be designed as a standalone deliverable of the product.

If you read my last post, you could guess the system we built for persisting objects falls clearly in to category #1. If you were to view the resultant XML, I doubt you’d be overwhelmed by the beauty of its structure. The XML is not being used to its full potential. No namespaces, no XPointer or XInclude. Just a raw dump of state stored in XML elements, with very little use of attributes.

On the other hand, if you were to look at the XML created by an MS Office product you would find a decent looking XML document. Namespaces are used and the XML elements don’t appear to map directly to the product’s underlying object model.

Want a better example? Look at a document file created using OpenOffice. These guys really cared about the XML file structure and use many XML standards (spec). In fact, there isn’t just a single XML file. They create a ZIP archive and place several XML files in the archive. They also include a manifest file to describe the other files contained in the archive. Whenever possible, they use or extend existing XML based specs to implement their format. Clearly not a direct mapping to their object models.

How important is the XML structure when using XML as a native file format? We could always suggest people use XSLT to change our XML to suit their needs. That sounds reasonable, right?