Persisting Can be Painful

We are considering changing from an OLE Structured Storage based file format to an XML based file format our flagship product. OLE Structured Storage has served us well, but we are starting to hit walls, especially with versioning and backward/forward compatibility. Yes, forward compatibility!

The product, mainly C++, has objects that save and load their state using a storage system. The storage system is IStorage/IStream and all of the objects know it. That’s a minor problem that can be fixed. The bigger problem is the fact that the object hierarchy defines the structure of the file stream. That means any change to the object hierarchy changes the structure of the file stream. Or, any significant change in the object model (version 1.0 to version 2.0) makes it very painful for v2.0 to read v1.0 files. Why the headache? Since each object is responsible for saving/loading is own piece, an object in v2.0 may need to remember what it looked like in v1.0. Even worse, what if an object in v1.0 no longer exists in v2.0? It can get pretty ugly.

A different product, still in development, uses a system that abstracts way the “how” of the persistent storage. The objects that make up the business logic can save and load their state using an abstract storage interface. It has implementations for persisting to binary streams, SQL databases and XML streams. Its not revolutionary, COM’s IPersist has allowed developers to do this kind of thing for years. However, we find ourselves primarily using the XML storage more than the others. In fact, we use it for more than persisting to files. We use it for copying state to clipboard and maintaining our Undo/Redo system as well. But I am getting ahead of myself.

I was reading a post by Chris Pratley where he talks about Word, RTF and XML:

RTF is used for this purpose instead since it is easier to deal with than Word binary for apps other than Word (remember that is why we created it – it stands for Rich Text interchange Format). The new XML format is designed for exactly that purpose – and it is easier to work with than RTF. You can create the WordML doc (or even a minimal subset) on a server using XML tools, then send the XML to Word on the client and Word will load it up. If you’re missing a lot of the Word specific stuff, that’s OK – Word will fill in the missing bits with defaults. In fact, you can skip generating the doc on the server if you want – just generate an XML data file in your own schema and provide an XSLT for Word to use when opening the file. That pushes a lot of the processing onto the client.

An idea for how to get past some of our versioning issues started to form:

  1. What if our objects persisted to XML?
  2. What if each version only cared about the structure of its XML?
  3. What if we used XSLT or XQuery to convert one version’s XML to another?

In my mind, forward compatibility (an old version opening a newer version’s file) would be hard. Backward compatibility would be easier. Just transform the old XML to the current version’s structure and load it. We could allow newer versions to save files in an older version’s XML structure. That would be one way of handling forward compatibility. We are researching this concept to see where it can take us.