Getting Started With P2P

I finally have a reason to try to implement P2P in a development project. I have used TCP sockets on several occasions and I have played around with UDP broadcasts as well. What I am trying to do with P2P needs to be more reliable than the UDP broadcasts.

The first thing I did was Google for “P2P framework library”. Unfortunately, not many relevant hits came back. The most popular was the Java-based JXTA toolkit. I work in C++, so it does not help me. Microsoft has a Peer-to-Peer toolkit, but it requires WinXP SP1 with Advanced Networking. I need to support OS’s older than that.

However, both toolkits appear to have exactly what I would in a P2P toolkit:

  • Discovery
  • Peer Groups
  • Messaging (including multicast)

Working in C++ on Win32 (Win9x/WinNT/Win2K/WinXP) keeps me from using these toolkits.

I would really like to use an existing toolkit to abstract the P2P implementation details. It’s going to be hard enough to get my own data sharing protocol working on top of P2P.

Other toolkits I have looked at include HOWL (an implementation of Zeroconf/Rendezvous) and BEEP. HOWL seems to only enable discovery. BEEP seems to only provide the messaging. I might look into merging the two together. I am also looking into wrappers for Multicast over Winsock.

Update: I have decided to work with Multicast for now. In the end, I’ll need to create some kind of message protocol at which point I’ll look at the RFC for BEEP.

Selling The Dream

It’s bound to happen sooner or later. Every so often Scoble (Microsoft blogging wunderkind) posts something that makes me question his sensibilities. One of his latest posts stopped me dead in my tracks. The post was a reply to a Jon Udell post on UI technologies in Longhorn. I thought Udell’s post asked questions any professional should be asking. Scoble tries to bring it into focus by saying:

Ahh, but Jon, the real play here is one of programmer productivity

Programmer productivity. As if every new technology/language introduced in the last few decades failed to deliver on that same promise. As if Longhorn and .NET are the productivity “tipping point.” As if being fluent in multiple technologies/languages is a bad thing. As if being fluent in multiple technologies/languages will go away.

Robert, your overselling the reality. I will use .NET to create commercial, shrinkwrap and enterprise software someday. I am becoming fluent in .NET technologies. But you should take a break from the Kool-Aid (although I hear the grape flavor is hard to put down). I see an advantage in understanding multiple technologies/languages and being able to choose the best for a particular situation. If anything, .NET will make the technology soup developers work with worse, not better. Just like every other “productivity improvement” before it.

High on XML

I have talked about our experimentation with XML as a file format. This is going very well. Since we started working with XML and its related technologies, other ideas started popping up. It’s not like XML is some magic concept, but it does open your mind enough to see other opportunities. Like creating a centralized Web Service to act as a repository for projects. You don’t need XML to do something like that, but since XML and Web Services go together, it’s easier to visualize how a service like that could be built off of (or into) our persistence system.

I think it’s because there are so many examples of XML being used for various things. Anytime you see XML being used for some purpose, you can ask yourself if your use of XML, whatever that may be, can be applied the way someone else uses XML. Another example could be syndication. RSS and Atom are XML-based syndication formats. Can I syndicate our file format? Maybe just modifications. I’ll have to float that one internally…

Task-based UI Design

Developing new products is great for trying new things. There is no legacy and lots of freedoms. One of our areas that is screaming for experimentation is the UI. UI models have been changing in recent years. Many products are taking more of a Web-look. If you start digging into the design reasons, you’ll run across discussions of Task-based UI or Inductive UI. It’s not hard to find examples, especially from Microsoft: Windows XP uses Task Panes and Web-isms in many parts of the Shell; Office XP also makes heavy use of Task Panes; Applications like Quicken and Money have very Web-like UI’s.

According to Microsoft, Task-based UI’s are suppose to address the following problems:

  • Users don’t seem to construct an adequate mental model of the product.
  • Even many long-time users never master common procedures.
  • Users must work hard to figure out each feature or screen.

The solution provided by Task-based UI’s is to provide simple, task oriented views that show the user what can be done now and what they can do next.

Obviously, slapping some Task Panes in your application doesn’t magically make it easier to use. This method is focused on the User Experience, but I would go further and say it’s User Assistance. As such, we are involving our Documentation team in establishing the design and guidelines for use in the application. In addition, this method also seems to play well with the teachings of Cooper and Raskin. We are looking forward to see the effects on overall usability.

RAD Test Scripting

The other day, some developer coworkers and I were talking about ways to make it easier for our Test group to create and maintain test scripts. Test scripts are a very important piece of the software development puzzle. But test scripts don’t find new defects. Test scripts are really good at finding breakage. That’s why scripts are run on new builds as BVT’s or release candidates as part of a larger regression suite. For me, few things are worse than uncontrolled breakage to keep a project from hitting deadlines.

To find new defects, you really can’t beat good test cases and exploratory testing. We wanted to find ways to keep our Test group working on finding new defects, not struggling to maintain scripts. Our current test scripting tool is just what it claims to be: A scripting tool. The problem with scripting is that it’s programming and programming brings a whole set of issues to the party. We’d like to hide those issues from most of the testers and only have a small team worry about them, if possible.

The genius idea we hit upon was a sort of RAD system for creating test scripts. Ah yes, well I did say we are developers. The general idea is to componentize the creation of scripts. Testers could piece together various script components to create test cases and suites. Of course, there would be a nice Windows application to allow testers to build their projects. We would save the project in an intermediate format and “compile” to native script that our scripting tool would execute.

We even found an additional benefit of the system: By maintaining our test suites in an intermediate format, we avoid script tool vendor lockin. When we switch to a new test scripting tool, we just need to write a new “compiler” to convert to the new script language. Vendor lockin is a real problem. How many commerical test scripting tools do you know use a common, portable language? It can make switching vendors very costly.

We are currently prototyping a system. I’ll let you know how it goes.

Test Scripts == Source Code

How is it that a software development company that understands the value and importance of the code their developers write doesn’t place the same value on the code their testers write? Good development processes apply to tester too. Regression test scripts are the health indicator of the development process. Build Verification Tests (BVT) or Smoke Tests are very important and, typically, rely on code written by testers.

Some mistakes I have seen in the past include:

  • Not storing scripts in a source control system
  • Not using a common/shared framework to build scripts
  • Not spending time to design or review scripts
  • Not building robust scripts that can be maintained across development cycles or even within the current cycle

There is a benefit from using test scripts. There is a cost as well. Use the same good sense you apply to your development process in your testing process.

Defects Love New Code

People who know more about software development processes than me, and write books on the subject, frequently talk about metrics like Defect Density (defects per thousand lines of code). Say your development team averages 10 D/KLOC. If you start working on the next version of a product with 1M LOC and eventually add 100K LOC, you could look into your crystal ball and expect 1K defects. Holy crap! Honestly, it could be worse. In my experience, adding code to a 1M line product is likely to break lots of the existing code as well.

What if you start a product completely from scratch and end up with 1M LOC? 10K defects could give the faint of heart a reason to jump out a window. How will you ever know when to ship your product? I have written enough code in various products for various companies to know that those people writing those books know what they’re talking about. I have seen it work out just like they predicted it would. Bad things happen to new code. I have come to expect it.

Sometimes people, usually pointy-haired types, get really caught up in things like defect counts. It drives me nuts. Defects are a fact of life in software development. You need to document and prioritize the defects discovered in your products. When it gets close to ship time, look at a cost benefit analysis of each defect. Compare the cost to fix to the cost of leaving it in the product. At some point, you will have to ship your product. It will have defects in it. Knowing what they are is a Good Thing. Making sure the bad ones were fixed is a Good Thing.

Joel On Software has a good article on the cost of fixing defects. Steve McConnell also has some good articles about when a product is ready to ship. They apply some rational thinking and decision making, instead of simply the “fix all bugs!” or “zero defect!” mantra pointy-haired types like to spout.

Using XML As File Format

In my last post, I talked about some persistence issues we are working through and how persisting to XML could make things easier for us. In this post, I want to talk about using XML as a file format. Assuming we switch to an XML based file format for our products, there are questions I have regarding the proper use of XML.

Looking around, I can see at least 2 different categories:

  1. XML file is a dump of your in-memory state at the time of saving.
  2. XML file is a system unto itself and should be designed as a standalone deliverable of the product.

If you read my last post, you could guess the system we built for persisting objects falls clearly in to category #1. If you were to view the resultant XML, I doubt you’d be overwhelmed by the beauty of its structure. The XML is not being used to its full potential. No namespaces, no XPointer or XInclude. Just a raw dump of state stored in XML elements, with very little use of attributes.

On the other hand, if you were to look at the XML created by an MS Office product you would find a decent looking XML document. Namespaces are used and the XML elements don’t appear to map directly to the product’s underlying object model.

Want a better example? Look at a document file created using OpenOffice. These guys really cared about the XML file structure and use many XML standards (spec). In fact, there isn’t just a single XML file. They create a ZIP archive and place several XML files in the archive. They also include a manifest file to describe the other files contained in the archive. Whenever possible, they use or extend existing XML based specs to implement their format. Clearly not a direct mapping to their object models.

How important is the XML structure when using XML as a native file format? We could always suggest people use XSLT to change our XML to suit their needs. That sounds reasonable, right?

Persisting Can be Painful

We are considering changing from an OLE Structured Storage based file format to an XML based file format our flagship product. OLE Structured Storage has served us well, but we are starting to hit walls, especially with versioning and backward/forward compatibility. Yes, forward compatibility!

The product, mainly C++, has objects that save and load their state using a storage system. The storage system is IStorage/IStream and all of the objects know it. That’s a minor problem that can be fixed. The bigger problem is the fact that the object hierarchy defines the structure of the file stream. That means any change to the object hierarchy changes the structure of the file stream. Or, any significant change in the object model (version 1.0 to version 2.0) makes it very painful for v2.0 to read v1.0 files. Why the headache? Since each object is responsible for saving/loading is own piece, an object in v2.0 may need to remember what it looked like in v1.0. Even worse, what if an object in v1.0 no longer exists in v2.0? It can get pretty ugly.

A different product, still in development, uses a system that abstracts way the “how” of the persistent storage. The objects that make up the business logic can save and load their state using an abstract storage interface. It has implementations for persisting to binary streams, SQL databases and XML streams. Its not revolutionary, COM’s IPersist has allowed developers to do this kind of thing for years. However, we find ourselves primarily using the XML storage more than the others. In fact, we use it for more than persisting to files. We use it for copying state to clipboard and maintaining our Undo/Redo system as well. But I am getting ahead of myself.

I was reading a post by Chris Pratley where he talks about Word, RTF and XML:

RTF is used for this purpose instead since it is easier to deal with than Word binary for apps other than Word (remember that is why we created it – it stands for Rich Text interchange Format). The new XML format is designed for exactly that purpose – and it is easier to work with than RTF. You can create the WordML doc (or even a minimal subset) on a server using XML tools, then send the XML to Word on the client and Word will load it up. If you’re missing a lot of the Word specific stuff, that’s OK – Word will fill in the missing bits with defaults. In fact, you can skip generating the doc on the server if you want – just generate an XML data file in your own schema and provide an XSLT for Word to use when opening the file. That pushes a lot of the processing onto the client.

An idea for how to get past some of our versioning issues started to form:

  1. What if our objects persisted to XML?
  2. What if each version only cared about the structure of its XML?
  3. What if we used XSLT or XQuery to convert one version’s XML to another?

In my mind, forward compatibility (an old version opening a newer version’s file) would be hard. Backward compatibility would be easier. Just transform the old XML to the current version’s structure and load it. We could allow newer versions to save files in an older version’s XML structure. That would be one way of handling forward compatibility. We are researching this concept to see where it can take us.