Proprietary Data Formats Considered Annoying
Tuesday, January 22nd, 2008 by Michael
Any programmer who has used version control will find the idea of -not- using it painful, and for good reason. Nevertheless, there are a few projects here for which it’s simply unavailable. As a general rule, any time we’ve had this issue at Tenthline the problem lies with the data formats we’re using.
I really dislike mailing documents back and forth. It’s inefficient, and it’s very easy for one of the authors to end up out of the loop, if the latest revision doesn’t get mailed out to everyone. Nevertheless, that’s the infrastructure I’m currently stuck with, if only because of the nature of our clients. As per their demands, all changes must be tracked in Microsoft Word documents. All hail the proprietary .doc format!
I’d rather track changes with a wiki, or, better yet, .rtf files and subversion, but thanks to the dependance on Microsoft’s own mechanisms, that’s not an option. Fortunately, we keep our own internal data on wikis, ensuring that I can always find the latest and up-to-date reference anytime.
On another note, Content Server, the CMS platform that Tenthline does a lot of development on, has built in version control, which isn’t bad, but it suffers from being entirely proprietary. Sometimes I appreciate having choice. Instead, Content Server keeps its data in a database which is largely inaccessible except through its own tools and UI. For end users, this isn’t bad. It’s certainly easier than getting them to manage their own solution. And the complex data stored by a CMS system more or less demands a database. For developers, however, the inability to access the flat files of our own .jsp code can be a bit of a detriment when doing heavy duty development.
These problems both stem from proprietary access to data. Word’s own revision tracking doesn’t quite translate to any other tool (though Apple’s Pages does a pretty good job most of the time) and Content Server’s internal database structure is too complex to attack with SQL. And yet, when you think about it, it’s text. Documents and source code. Why in the world should access to text be difficult? Or, more to the point, why isn’t this data accessible through anything other than this program it was written in?
There are a number of guiding principles for UNIX, and one of them is as follows. “Write programs to handle text streams, because that is a universal interface. Avoid stringently columnar or binary input formats. Don’t insist on interactive input.”
The sentiment behind these rules could be described as follows. “Keep your data dumb and human readable.” This is the same driving philosophy that lead to YAML as an alternative to XML. If your data storage is human readable text, then it’s going to be much easier to write software to understand it. The more proprietary information you add into the data, the harder it is to write a tool that parses it.
Good tools, in many ways, make the developer. Good data formats enable good tools. It’s a good lesson to keep in mind when designing software. The output data your software produces should be general, and easily readable by other programs. Failure in this area binds the data to your specific application. That might work well if you’re Microsoft, but it’s guaranteed to annoy your power users.
Tags: Content Management, Programming




January 22nd, 2008 at 11:14 am
FatClipse is an Eclipse plug-in that Fatwire’s currently developing to provide an IDE of sorts for Content Server development. It seems as though this was badly needed — check out some of the comments on the front page:
“is it possible, that at last there is light at the end of the long dark tunnel … ?”
I think this illustrates your point nicely. =D
January 22nd, 2008 at 11:17 am
Yes indeed
That being said, consider how difficult this must have been to make. (I’m basing that on the fact that it wasn’t released, oh, two years ago.) Now consider that I don’t really use Eclipse — though I imagine I’m going to have to learn in a real hurry. How hard would this be to make for my favourite IDE?
Still, glad to see the improvement.
January 22nd, 2008 at 11:22 am
I suppose it depends on your IDE. Eclipse has a pretty nice plug-in system but I imagine that this could be duplicated for IntelliJ IDEA and NetBeans.
January 22nd, 2008 at 11:33 am
Time spent tweaking my tools is time -not- spent working with clients. The difficulty of transitioning into a new IDE is not that great compared to porting a plug-in.
Of course, I could wait for someone else to port it, but that’s just the same problem as before. Depending on someone else to provide access to my own data is not a place I want to be.
January 22nd, 2008 at 11:52 am
By the by, if you want to do revision tracking on documents using Subversion, RTF probably isn’t the best format. Yes, it’s universally available but it isn’t quite plain text, which is what tools like Subversion excel at tracking changes to. Even an XML format like DocBook would be easier to work with.
January 22nd, 2008 at 12:44 pm
I’ve never had an issue with .rtf files, but I see what you’re saying. Still, XML? I’d rather use textile, myself, but I’ve never gotten too hung up on the difference between content and presentation. It’s invaluable when your content changes regularly (e.g. just about every web app written ever) but less so for docs.
Whatever I use, it had better be readable by a client with zero technical experience. Exporting as a .pdf isn’t an option — not when the client might want to make edits.
January 22nd, 2008 at 1:51 pm
True. For stuff that the client needs to work with directly, an XML dialect is a terrible option. Sure, they could easily see the current document in their favourite format but their changes would have to be made to the XML . As long as you don’t really need to look at the diffs, I could see RTF being a much better option in that case. =P
For internal use or for end-user documentation that is produced entirely by us, DocBook would work really well, I think. Many open source software projects already use it with great success. I think it is even a standard in KDE development.