Archive for February, 2010

Excel – XML Workflow – part 2

In the previous post, we were round-tripping data between Job workbooks and Timesheet workbooks. The final link in the cycle was for each Job workbook to update its cumulative hours-booked value, from a single XML file JobHoursList.xml. Here’s an example of that file:

<?xml version='1.0' encoding='ISO-8859-1'?>
<jobhourslist>
  <jobhoursentry>
    <jobnumber>0001</jobnumber>
    <jobhours>91</jobhours>
  </jobhoursentry>
  <jobhoursentry>
    <jobnumber>0002</jobnumber>
    <jobhours>27</jobhours>
  </jobhoursentry>
</jobhourslist>

A Job workbook needs to find the jobhoursentry for its own jobnumber, and load the associated jobhours value.

I decided to load the file, and search it sequentially, using the MS XML library (I went for Microsoft XML v6.0, which is called MSXML2 – I don’t know whether the version makes any difference).

The Workbook_Open handler calls a Load procedure. The whole file is loaded as a MSXML2.DOMDocument:

    Set doc = New MSXML2.DOMDocument
    With doc
        .async = False
        .validateOnParse = False
    End With
    ok = doc.Load(ActiveWorkbook.Path + "\" + datafile)

The nodes are each of type MSXML2.IXMLDOMNode. The jobhourslist is the document (that is, root) element:

    Set jobhourslist = doc.DocumentElement

Then we iterate through the child (jobhoursentry) nodes:

    For Each jobhoursentry In jobhourslist.ChildNodes
      Set jobnumber = jobhoursentry.FirstChild
      Set jobhours = jobhoursentry.LastChild
      'check jobnumber against the one in this workbook
      'and if a match, update the jobhours cell
      '...

Surprisingly straightforward. Incidentally, “DOM” stands for Document Object Model – that is, the node-tree representation of an XML document.

Here’s the VBA module, if you’re interested.

Excel – XML Workflow

Following on from a previous suggestion about workbook initialization from XML data, I’ve had a go at implementing a bit of round-trip workflow, based on a time-reporting scenario.

Time is booked against currently active jobs; this data is totalled for each job, and updates the job specifications accordingly. I’m assuming this works on a regular cycle, say weekly. Here’s an overview:

A Job workbook contains:

  • Job Number
  • Job Name
  • Start Date
  • End Date
  • Total Hours booked.

There’s a job_Map, so that the data can be exported to a corresponding XML data file. This is done on the Workbook_BeforeClose event (arrow A).

The Job*.xml files are imported into an XML database (Qizx/Free Engine). We can then run a simple XQuery over the collection to generate the list of all active jobs – those that have a Start Date, but no End Date. This list is saved as joblist.xml (arrow Q1).

When a new Timesheet is created (from a template), the joblist.xml data is imported, and is mapped to an Excel table, which expands to hold the possibly multiple rows (arrow B).

The hours column is not mapped: that’s where the user enters the time data.

Now, we want to export jobnumber + hours for each timesheet entry. However, we can’t have jobnumber in two maps (input and output). My solution (there might be better ones) is to copy the data to another table on another worksheet.

To get the data across, I’m using the formula:

=Table_tslist[[#ThisRow],[jobnumber]]

where Table_tslist is the input table. The #VALUE! errors are because the table is fixed size.

The output table is mapped within timesheet_Map (along with tsname and tsperiod). We can now export the data on the Workbook_BeforeClose event (arrow C). The empty (#VALUE!) rows generate empty elements, which are just ignored.

The Timesheet*.xml files are imported into the database, and another XQuery is run to generate a list of (all) job numbers + total hours booked (arrow Q2).

The final link in the cycle is for a Job workbook to refresh its total hours data, when it is opened (arrow D). This is different from arrow B, in that it’s not a simple Import. Each Job workbook wants only the hours value for its own job number. There might well be various ways of doing this, but I’m doing a programmatic load-document-and-lookup, in VBA. I’ll leave this to the next post.

And it’s all working pretty neatly. Potentially, we could extract and summarize other information from the timesheets, and present this in the Job workbooks – for example, the latest period with booked time.

Workbook Initialization

Interesting post over at Bacon Bits. The idea here is that a workbook application has a single-field XML Map linked to a shared XML file somewhere. This file contains the current version number of the application, and is updated accordingly by the developer. A handler for the Workbook_Open event refreshes the XML map, and then checks the workbook’s own version number (encoded in the handler procedure) against the refreshed value. If these are different, a MsgBox is displayed prompting the user to get the updated version, and the workbook is closed.

That’s neat. Furthermore, you could use this pattern to do all sorts of ‘dynamic initialization’. For example, a timesheet workbook could initialize itself with a set of current jobs, obtained from a ‘master’ XML file.

This file could itself be the result of a query over a ‘database’ of XML documents. For example, we could have an XML document per job (mapped into, and editable via an Excel workbook), with start and end dates. Jobs without an end date are Active, and are picked up by an XQuery which generates the current ‘master’ file. This is then imported by our timesheet workbooks when they open, as above. I think I might have a go at implementing this workflow – more anon.

Recursive Functions

I’ve been experimenting with XQuery, which is a ‘pure’ functional language: you just have functions and expressions. You iterate over collections of documents/nodes using a SQL-SELECT-like construct: For-Let-Where-OrderBy-Return (FLWOR). There are also quantified expressions: Some/Every-In-Satisfies.

There are no explicit loops, but you can write a function that is recursive - that is, the function calls itself, until a terminating case is reached (e.g the empty-list case). Each recursive call adds a frame to the call stack of the implementation; when the terminating case is reached, the calls unwind and the frames are popped off the stack, and the result value is constructed. The amount of stack space then limits how long the chain of recursive calls can be (unless the implementation optimizes recursion to iteration – some functional languages do).

Anyway, I wondered how many recursive calls you can make in VBA before you run out stack space (this is in Excel 2007). Here’s a trivial recursive function.

Function Recursive(n As Long) As Long
   If n = 0 Then
      Recursive = 1
   Else
      Recursive = 1 + Recursive(n - 1)
   End If
End Function

This works up to n = 6128. When n = 6129, I get Run-time Error 28 – Out of stack space. That’s a rather higher number than I was expecting. If you change the Long to an Integer, it works up to n = 6285, so the number and types of the arguments makes a slight difference.

It’s unlikely perhaps that you would write a straightforwardly recursive function like this, where a loop would do. However, it’s just conceivable that you might get a chain of object-to-object calls, within a custom data structure, long enough to cause an error.

WordPress Demo

This is a demo posting, to show a friend how brilliant WordPress is!

Here’s a picture…

XML Export from Excel – part 3

In Part 2, we had some VBA code that exported data from financial-report workbooks into XML files. What we can do now is query these files to produce summary tables – rather like views of a pivot table – and include these in a PDF or XHTML report, to provide an ‘executive summary’. (Note: we’re leaving VBA behind here, but stay tuned).

Here’s what our exported XML data looks like:

<?xml version="1.0" standalone="yes"?>
<bureport xmlns:xsi='http://www.w3.org/...>
 <buname>Middleware</buname>
 <period>4Q2009</period>
 <product>
    <productname>Connectomatic</productname>
    <productsales>456</productsales>
    <productcosts>65</productcosts>
 </product>
 <product>
    <productname>Netotron</productname>
    <productsales>43</productsales>
    <productcosts>34</productcosts>
 </product>
 <totals>
    <totalsales>499</totalsales>
    <totalcosts>99</totalcosts>
 </totals>
</bureport>

And here’s the kind of report we might generate (this is XHTML):

The idea is that this is a presentable/printable report. We could have a lot more in terms of sections, supporting text, fancy layout, etc.

So how do we get from the XML data to the summary report? Here’s one possibility:

  1. import the XML files into an XML database
  2. run queries over the database to generate summary tables as individual DITA topics
  3. include (content reference) the summary tables from our master-report topic, which has the overall structure and additional content
  4. generate the output report in the desired format(s).

Sounds a bit of a performance – why don’t we just collate all our data in a ‘database’ Excel workbook, knock up a few pivot tables, and copy-paste these into a Word document? Basically, flexibility.  Firstly, a pivot table isn’t a general query mechanism: we might want all sorts of summary information, not just dimension-aggregations. Secondly, the individual summary tables might be included in different combinations in different reports. Also, we want the reports to reference fixed locations, where they can pick up the latest (e.g. last month’s) summary tables automatically.

For the XML database, I’m using XMLMind’s QizX Free Engine. You can use this either from a command-line or via a simple client app. There’s an XQuery query to generate each table (they’re very similar): here‘s the one for the totals by Business Unit (BU).

The summary tables are minimal DITA topics containing a <simpletable>:

<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE topic ...>
<topic id="FR_BU_Totals_Data">
 <title>BU Totals Data</title>
 <body>
    <simpletable id="BU_Totals_Data">
       <sthead>
          <stentry>BU</stentry>
          <stentry>Sales</stentry>
          <stentry>Costs</stentry>
       </sthead>
       <strow>
          <stentry>Applications</stentry>
          <stentry>4707</stentry>
          <stentry>2325</stentry>
       </strow>
       <strow>
          <stentry>Middleware</stentry>
          <stentry>1103</stentry>
          <stentry>421</stentry>
       </strow>
    </simpletable>
 </body>
</topic>

The “FR” prefix on the topic id stands for “Financial Report”. Note the id also on the <simpletable>.

Now, in the DITA topic for our master report, we just have a <simpletable> element that con(tent)ref(erence)s the summary table in its separate topic:

<simpletable conref="..."/>

where the “…” is a reference of the form: filepath#topicid/elementid.

Now we just generate the report output in our chosen format. Multiple reports can be built into a larger structure: a set of XHTML pages with appropriate linking, or a multi-section PDF document, both with tables of contents, etc.

Stages #1, #2 and #4 above can all be done by command-line tools, so it should be possible to automate the entire process (including the VBA bit).

I think that deserves a cup of tea…

XML Export from Excel – part 2

In Part 1, I wanted to export the data in a simple workbook as XML, so I can pick it up with XML query and documentation tools (of which more anon).

The code needs to:

    1. construct the output file name from the BU name and period (e.g. Report_Applications_4Q2009.xml)
    2. open a Scripting.TextStream on this (new) file
    3. get the complete XML string from the bureport map
    4. substitute the basic <bureport> element with the full one
    5. write the modified XML string to the TextStream.

      Items 3 and 4 are done by a simple function:

      Function XMLwithNS() As String
      
          Dim xml As String
          Dim res As XlXmlExportResult
      
          res = _
            ActiveWorkbook.XmlMaps(mapname).ExportXml(xml)
          If res = xlXmlExportSuccess Then
              XMLwithNS = Replace(xml, _
                            basicroot, fullroot, 1, 1)
          Else
              XMLwithNS = ""
          End If
      End Function

      “NS” in the function name stands for “namespace” – that is, the reference to the XML schema, discussed in Part 1. basicroot and fullroot are the before and after versions of the root element.

      Notice that XMLMap.ExportXML returns a success/failure status; the (ByRef) variable xml gets the actual XML string. The export method will fail if the mapped data is invalid with respect to the map (i.e. schema); we return an empty string in this case.

      Here’s the main routine:

      Sub ExportXMLtoFile()
      
          Dim bu As String, period As String
          Dim xmlfilename As String
          Dim xml As String
      
          bu = Range("BU").Value
          period = Range("Period").Value
          xmlfilename = basename + "_" + bu + "_" + _
                                period + ".xml"
      
          Dim fso As New Scripting.FileSystemObject
          Dim ts As Scripting.TextStream
      
          ChDir ActiveWorkbook.Path
          MsgBox "Current directory is " & CurDir
      
          xml = XMLwithNS
          If Len(xml) > 0 Then 'okay
              Set ts = fso.CreateTextFile(xmlfilename)
              ts.Write XMLwithNS
              ts.Close
              MsgBox "Exported XML to file " & xmlfilename
          Else 'XML did not validate
              MsgBox "XMl does not validate; export aborted"
          End If
      End Sub

      As mentioned before, we’d probably want to run this as a batch operation from a ‘master’ workbook, which iterated over the data workbooks (maybe looking at File.DateLastModified).

      So, having now got umpteen XML data files, what can we do with them? I’ll discuss this in Part 3.


      February 2010
      M T W T F S S
      « Jan   Mar »
      1234567
      891011121314
      15161718192021
      22232425262728

      Follow

      Get every new post delivered to your Inbox.