Spreadsheets in XML – Part 2

In the previous post, I was looking at the ‘spreadsheet extensions’ provided by XMLMind’s XML Editor (XXE). This allows XPath-based formulas to be inserted into XML documents, not only in tabular elements, but also in free text.

As an example, I mocked up some invoices. An invoice is a good example of a hybrid document: we want to print it out (or PDF it) as a nicely formatted document; there are calculated and looked-up elements in the manner of a spreadsheet; we want the whole set of invoices to be queryable subsequently, in the manner of a database.

Here’s an invoice, as a DITA document, shown in XXE:

The little green ‘f’ icons represent the formulas, held as XML Processing Instructions. These are ignored in subsequent transformations (to final formats, such as PDF). You double-click an icon to edit the formula.

The first one (before the table) is today’s date: =today(). The ones in the Cost column are simple arithmetic: =($C2 * $D2), etc. Column and row headers can be displayed optionally:

The Product Description and Unit Price formulas are more interesting, since they are lookups in another document, containing the product catalog. Here’s the formula in B2:

The first thing to notice is that we can have multi-line formulas, with ‘let’ definitions preceding the actual formula. (The “…” is really a full file path – I’ve elided it for compactness). The id of the element with the product description is the product code appended with “_desc”. This is then retrieved from the product catalog by matching the id attribute (@id) with the constructed value ($id). (The back-quotes indicate ‘raw’ XPath, rather than XXE formula language).

Here’s the Product catalog (not very extensive!):

The formulas here are used not to calculate visible values, but to construct values for the id attribute. For example, in B2:

Note the id attribute picked from the drop-down list. In Excel terms, this is rather like having a formula that constructs a Range name. It means that the ids for cells in column B and C always follow the product codes in column A. I think this is rather neat.

Back in the invoice, the Total Cost formula sums the values in the Cost column (E) – see the first scrrenshot. We could do this with a table/column reference, but an alternative is to tag the Cost cells with a common attribute value. In DITA, @outputclass allows a kind of informal specialization (we can’t use @id, as this must be unique within a document). Here, we can set @outputclass = ‘cost’. Now, the Total Cost formula sums all elements with this attribute value, wherever they are in the document:

=sum(`//*[@outputclass=’cost’]`)

That’s it, in terms of the documents. We can then generate formatted output, as we require.

The database aspect comes if the invoices are put into an XML database, such as XMLMind’s Qizx (Free Engine edition). This provides indexing and querying, using the XQuery language. We can then calculate aggregated values, for example by customer and product. Here’s a simple query to calculate the total invoiced for a given product:

xquery version "1.0";

let $prod := "PR01"    (:edit this:)

let $costs :=
 for $row in //strow
 let $cost := $row/stentry[@outputclass='cost']
 where $row/stentry[1]/text() = $prod
 return $cost
return ($prod, sum($costs))

strow is a simple-table row, stentry is a cell. One could, of course, get a lot fancier, and produce proper date-based reports.

There’s an interesting contrast here with how we would do this in Excel. If each invoice is a separate Workbook, we would need to provide some collation mechanism for the data, to get it into a single source for pivot tables, etc. – either in a single workbook, or in Access. I think that where we have a large number of computationally relatively simple documents, the XML approach is quite attractive.

0 Responses to “Spreadsheets in XML – Part 2”



  1. Leave a Comment

Leave a comment




November 2010
M T W T F S S
1234567
891011121314
15161718192021
22232425262728
2930