Any programmer who has used version control will find the idea of -not- using it painful, and for good reason. Nevertheless, there are a few projects here for which it’s simply unavailable. As a general rule, any time we’ve had this issue at Tenthline the problem lies with the data formats we’re using.

I really dislike mailing documents back and forth. It’s inefficient, and it’s very easy for one of the authors to end up out of the loop, if the latest revision doesn’t get mailed out to everyone. Nevertheless, that’s the infrastructure I’m currently stuck with, if only because of the nature of our clients. As per their demands, all changes must be tracked in Microsoft Word documents. All hail the proprietary .doc format!

I’d rather track changes with a wiki, or, better yet, .rtf files and subversion, but thanks to the dependance on Microsoft’s own mechanisms, that’s not an option. Fortunately, we keep our own internal data on wikis, ensuring that I can always find the latest and up-to-date reference anytime.

On another note, Content Server, the CMS platform that Tenthline does a lot of development on, has built in version control, which isn’t bad, but it suffers from being entirely proprietary. Sometimes I appreciate having choice. Instead, Content Server keeps its data in a database which is largely inaccessible except through its own tools and UI. For end users, this isn’t bad. It’s certainly easier than getting them to manage their own solution. And the complex data stored by a CMS system more or less demands a database. For developers, however, the inability to access the flat files of our own .jsp code can be a bit of a detriment when doing heavy duty development.

These problems both stem from proprietary access to data. Word’s own revision tracking doesn’t quite translate to any other tool (though Apple’s Pages does a pretty good job most of the time) and Content Server’s internal database structure is too complex to attack with SQL. And yet, when you think about it, it’s text. Documents and source code. Why in the world should access to text be difficult? Or, more to the point, why isn’t this data accessible through anything other than this program it was written in?

There are a number of guiding principles for UNIX, and one of them is as follows. “Write programs to handle text streams, because that is a universal interface. Avoid stringently columnar or binary input formats. Don’t insist on interactive input.”

The sentiment behind these rules could be described as follows. “Keep your data dumb and human readable.” This is the same driving philosophy that lead to YAML as an alternative to XML. If your data storage is human readable text, then it’s going to be much easier to write software to understand it. The more proprietary information you add into the data, the harder it is to write a tool that parses it.

Good tools, in many ways, make the developer. Good data formats enable good tools. It’s a good lesson to keep in mind when designing software. The output data your software produces should be general, and easily readable by other programs. Failure in this area binds the data to your specific application. That might work well if you’re Microsoft, but it’s guaranteed to annoy your power users.

Tags: ,