Posts Tagged 'Documentation'

XML Export from Excel – part 3

In Part 2, we had some VBA code that exported data from financial-report workbooks into XML files. What we can do now is query these files to produce summary tables – rather like views of a pivot table – and include these in a PDF or XHTML report, to provide an ‘executive summary’. (Note: we’re leaving VBA behind here, but stay tuned).

Here’s what our exported XML data looks like:

<?xml version="1.0" standalone="yes"?>
<bureport xmlns:xsi='>

And here’s the kind of report we might generate (this is XHTML):

The idea is that this is a presentable/printable report. We could have a lot more in terms of sections, supporting text, fancy layout, etc.

So how do we get from the XML data to the summary report? Here’s one possibility:

  1. import the XML files into an XML database
  2. run queries over the database to generate summary tables as individual DITA topics
  3. include (content reference) the summary tables from our master-report topic, which has the overall structure and additional content
  4. generate the output report in the desired format(s).

Sounds a bit of a performance – why don’t we just collate all our data in a ‘database’ Excel workbook, knock up a few pivot tables, and copy-paste these into a Word document? Basically, flexibility.  Firstly, a pivot table isn’t a general query mechanism: we might want all sorts of summary information, not just dimension-aggregations. Secondly, the individual summary tables might be included in different combinations in different reports. Also, we want the reports to reference fixed locations, where they can pick up the latest (e.g. last month’s) summary tables automatically.

For the XML database, I’m using XMLMind’s QizX Free Engine. You can use this either from a command-line or via a simple client app. There’s an XQuery query to generate each table (they’re very similar): here‘s the one for the totals by Business Unit (BU).

The summary tables are minimal DITA topics containing a <simpletable>:

<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE topic ...>
<topic id="FR_BU_Totals_Data">
 <title>BU Totals Data</title>
    <simpletable id="BU_Totals_Data">

The “FR” prefix on the topic id stands for “Financial Report”. Note the id also on the <simpletable>.

Now, in the DITA topic for our master report, we just have a <simpletable> element that con(tent)ref(erence)s the summary table in its separate topic:

<simpletable conref="..."/>

where the “…” is a reference of the form: filepath#topicid/elementid.

Now we just generate the report output in our chosen format. Multiple reports can be built into a larger structure: a set of XHTML pages with appropriate linking, or a multi-section PDF document, both with tables of contents, etc.

Stages #1, #2 and #4 above can all be done by command-line tools, so it should be possible to automate the entire process (including the VBA bit).

I think that deserves a cup of tea…

XML Export from Excel – part 1

I thought I’d look at using Excel as an information-gathering front-end for XML-based data, which can then feed into production of documentary reports (HTML, PDF). Or put the other way round, a reporting facility for Excel.

Here’s a simple scenario. My company has various Business Units (BUs) which produce and sell various products. Every Quarter, each BU produces a Sales/Costs report as an Excel workbook. Something like this:

The basic data is in the blue ProductTable. The pink cells are simple totals (=SUM(ProductTable[Sales]), etc, using structured table references). Incidentally, I should have the period as 2009Q4, etc, so that values sort lexically (2009Q4, 2010Q1, …).

To export this data as XML, we need to create a map from worksheet cells to XML schema elements. This is what the schema looks like in the Excel XML Source pane:

Each element is mapped to the appropriate cells by dragging-and-dropping. The root element is <bureport>.

Now, we could just click  >Developer >XML >Export. But there are a couple of reasons why we might want to do it from code. Firstly, each period, we will be receiving a report from each BU, and it would be nice to do the export as a batch operation.

Secondly, the XML map inside Excel seems to not keep a reference to the schema from which it was created. This means that when the XML is exported the root element comes out as just <bureport>, whereas we want it to have a reference to the schema:

<bureport xmlns:xsi=’; xsi:noNamespaceSchemaLocation=’bureport.xsd’>

where the critical bit is the filename at the end: bureport.xsd. For simplicity, this is assumed to be in the same directory as the bureport XML file.

Maybe I’m missing something here, but it seems odd that Excel internalizes the schema, but doesn’t have an option to associate a schema location with a map, and then generate a root element with that location, as above, in exported files.

Now, to begin with, I’m going to ignore the batch-operation over multiple workbooks, because it should be straightforward. So the exporting code is going to go into the bureport workbook (template) itself, and I’ll just run it directly from each workbook, from the VBA side.

The code needs to:

  • construct the output file name from the BU name and period (e.g. Report_Applications_4Q2009.xml)
  • open a TextStream on this (new) file
  • get the complete XML string from the bureport map
  • substitute the basic <bureport> element with the full one (see above)
  • write the modified XML string to the TextStream.

Code tomorrow.

Learning VBA

Interesting post (and comments) over at Daily Dose of Excel (referencing another post). Basically saying that people should be learning and using (a bit of) VBA, as part of their basic efficiency/productivity – assuming they use Office apps, of course. For example, you wonder how many people copy-and-paste quantities of data manually (probably making mistakes) on a regular basis, when a few lines of code would do it for them.

Unfortunately, it is part of our culture that programming is regarded as something terribly technical and obscure, not for ordinary people. Microsoft themselves are at fault for putting ever more layers of user interface in front of the application functionality. Lots of PC users do not know what a Command Prompt is. I admit that VBA as a language is a bit quirky and inconsistent (the Set Keyword, for example), but it’s not unusable.

There’s a similar issue over in the domain of XML documentation. There are technical writers (‘technical’, note) who feel that everything should be entirely wrapped up in a GUI. This might be feasible if you are prepared to do what the tools let you do, and nothing else. But I think having some knowledge of the underlying languages is helpful.

It also allows you to use the many free tools, which tend to be light on GUIs. For example, a set of XML document files on disk can be queried (and even updated) using the XQuery language, effectively turning the files into a content database. XQuery is a very clean, functional language, with a SELECT-like construct to do the querying (much easier than VBA!). There are free (versions of) tools that index files and run queries.

So, yes, I think that there should be greater encouragement for ‘serious’ users of tools to roll their sleeves up and get into some (gentle or not so gentle) programming. At least they’d appreciate a bit more what developers do!

Adding HTML Help – 2

I was previously wondering how to link a VBA project to HTML Help (that is, linking to a page in a .chm file). I’m thinking of the Help as pages in a User Manual, not control-level Help (“The Delete button deletes the current item”, etc).

Having got some compiled Help (a CHM file), we need to link into it. The simplest way seems to be to use the Application.Help method. This takes a path to a CHM file, and a numerical key – for example:

Application.Help “C:\…….\myhelp.chm”, 1002

There are a couple of issues with this. The first is that each call to this method (on the same file) opens a new Help Viewer window. I haven’t found a way round this. The second issue is that the numerical keys have to be defined in a header file prior to compilation – something like this:

#define VBA_Creating_a_Widget     1001
#define VBA_Filtering_Widgets        1002

where the names are those of the topic files in the HTML Help project (minus the .html suffix). This mapping is compiled into the help project to produce the CHM file. However, generation from DITA (XML) source goes straight to the compiled CHM, so there’s no chance of adding in a map. (You could decompile the CHM and then add the map before recompiling).

Fortunately, it doesn’t matter, because there’s another way of linking to the Help from VBA. This is to use a more general function, HtmlHelp, available from the Help OCX (HHCtrl.ocx).

You need to declare this function in a public module:

Declare Function HtmlHelp Lib "HHCtrl.ocx" _
   Alias "HtmlHelpA" _
   (ByVal hwndCaller As Long, _
   ByVal pszFile As String, _
   ByVal uCommand As Long, _
   dwData As Any) As Long

You also need a couple of constants (just to give us meaningful names):


The constant declarations  need to be in each module that calls the function.

We can now call the function from a Command button handler, or whatever:

Private Sub HelpCommand_Click()
  Call HtmlHelp (0, Dotslash("myhelp.chm"), _
      HH_DISPLAY_TOPIC, ByVal "VBA_Filtering_Widgets.html")
End Sub

The zero argument is to do with the windowing – the Help opens in a sibling window. Note the ByVal passing of the string (because it’s to an external library). Dotslash is just my shorthand for an absolute path (which is recommended):

Public Function Dotslash(filename As String) As String
  Dotslash = ActiveWorkbook.Path + "\" + filename
End Function

The other mode argument, HH_HELP_CONTEXT, is for the mapped-id lookup, which is what Application.Help does.

This is just scratching the surface. There’s more information on MSDN.

Adding HTML Help

We’re all familiar with HTML Help for applications – usually in the form of .chm files (with the yellow question-mark icon). Typically, constructing these this involves getting to grips with the Microsoft HTML Help Workshop. This includes a tool for building Help projects, and compiling them into .chm files.

This is okay for fine-grained context-sensitive Help, at the level of individual operations and controls. However, if the material is more in the nature of a User Manual, this is probably not the form in which we want to author it. And we will probably want to generate different forms of output, such as general HTML pages and PDF.

A few posts ago, I mentioned producing documentation using the XML-based DITA framework. One of the benefits of this is that content is held in a generic XML-based format, from which outputs of different types can be generated: XHTML, PDF and CHM (and also Java Help). DITA allows us to select and combine various topics (chunks of content) from our User Manual – perhaps the lower-level task or reference topics – and compile these into a .chm file. Apart from the compiler, which is invoked automatically, this does not involve the HTML Help Workshop.

Note on tools: There is a DITA Open Toolkit, but I have been using tools from XMLMind – the free XML Editor (Personal Edition) and the open-source DITA Converter. And very good they are too.

The selection, structuring and ordering of topics is done using a DITA Map file. HTML Help is specified as the output format via options to the DITA Converter.

Having got our .chm file, the next step is to access it from a VBA project. I’d been vaguely aware that you could do this, but had never investigated further. More tomorrow…

Digression on Documentation: DITA

As is the way of things, I’ve been diverted from the delights of VBA, on to some other work. This is a migration of the system definition documentation for a major financial system, from traditional Word documents to an XML-based architecture called DITA. (Somewhat fancifully, the ‘D’ stands for Darwin, which is appropriate in this, his bicentenary year).

The production of technical documentation is undergoing something of a revolution at present. This is due to the maturing of a raft of technologies based on the XML markup language. Broadly speaking, these technologies provide a solution that sits between monolithic documents-as-files (such as Word documents), and relational databases, as complex aggregations of fine-grained information records.

With a Word document, the unit of content is the same as the unit of presentation: a file is edited, and the same file is printed. This applies equally to web pages (HTML), but with the page as the unit. In practice, it is difficult and time-consuming to identify, extract and recombine fragments of documents to produce new deliverables.

With a relational database, the information records can be conjoined, aggregated and filtered in very complex ways, using a query language (SQL). However, databases are not really suited to holding large free-text elements, like a section of a document. Also, there is no notion of hierarchical structuring in query output, in contrast to the hierarchy of chapters, sections and sub-sections that we are familiar with in documents.

The XML-based solutions aim to provide a middle course. Content is created and held in a form that is structured enough to identify, extract and recombine fragments of documents to produce new deliverables. At the same time, the content does not carry information about presentation (either the target format or the details of layout). This is provided by transformations of the content to produce deliverables in different formats, such as Word or PDF for printable documents, or hypertext (XHTML) for web presentation or online Help.

The challenge is to come up with an information model that defines and relates appropriate topics (i.e. basic chunks), in ways that allow querying, selection and combination in flexible ways. There’s a trade-off here between flexibility and chunk size. Too fine-grained and it’s impossible to manage; too coarse-grained and you’re back with monolithic documents. There’s a wider trend towards ‘medium-sized’ information chunks: think of blog posts, like this one, or Wiki pages.

More on this anon…

November 2019
« Dec