As is the way of things, I’ve been diverted from the delights of VBA, on to some other work. This is a migration of the system definition documentation for a major financial system, from traditional Word documents to an XML-based architecture called DITA. (Somewhat fancifully, the ‘D’ stands for Darwin, which is appropriate in this, his bicentenary year).
The production of technical documentation is undergoing something of a revolution at present. This is due to the maturing of a raft of technologies based on the XML markup language. Broadly speaking, these technologies provide a solution that sits between monolithic documents-as-files (such as Word documents), and relational databases, as complex aggregations of fine-grained information records.
With a Word document, the unit of content is the same as the unit of presentation: a file is edited, and the same file is printed. This applies equally to web pages (HTML), but with the page as the unit. In practice, it is difficult and time-consuming to identify, extract and recombine fragments of documents to produce new deliverables.
With a relational database, the information records can be conjoined, aggregated and filtered in very complex ways, using a query language (SQL). However, databases are not really suited to holding large free-text elements, like a section of a document. Also, there is no notion of hierarchical structuring in query output, in contrast to the hierarchy of chapters, sections and sub-sections that we are familiar with in documents.
The XML-based solutions aim to provide a middle course. Content is created and held in a form that is structured enough to identify, extract and recombine fragments of documents to produce new deliverables. At the same time, the content does not carry information about presentation (either the target format or the details of layout). This is provided by transformations of the content to produce deliverables in different formats, such as Word or PDF for printable documents, or hypertext (XHTML) for web presentation or online Help.
The challenge is to come up with an information model that defines and relates appropriate topics (i.e. basic chunks), in ways that allow querying, selection and combination in flexible ways. There’s a trade-off here between flexibility and chunk size. Too fine-grained and it’s impossible to manage; too coarse-grained and you’re back with monolithic documents. There’s a wider trend towards ‘medium-sized’ information chunks: think of blog posts, like this one, or Wiki pages.