Archive for the 'VBA Projects' Category

Exporting XML From Excel Tables – 3: Self-checking Structure

In the previous post, we were looking at how to export a hierarchical structure as XML, from a set of Excel tables, where the rows are hierarchically numbered (1, 1.2, 1.2.3, etc). The approach in our VBA is to build a hierarchical object model from our worksheet tables, and then ask the root of the model for its XML string. Both the building and the string generation proceed recursively down the structure. Note that this approach works for any hierarchical data – in our case it is software test definitions.

Having built a hierarchical object structure from our worksheet data, there are a number of other things we might want to do:

  • check the test definitions for structural validity
  • generate output in other formats, such as plain text (really plain, not XML)

I’ll talk a bit more about the first of these, here.

The basic idea is that of a class invariant – a property (which can be a conjunction of simpler properties) that always applies to any instance of the class. This idea is, of course, part of Bertrand Meyer’s Design By Contract approach.

Writing good invariants is hard, particularly for objects that are essentially reactive in their behavior. However, in our case here, we have a static hierarchical object structure, and there are some fairly obvious relationships between a node in our hierarchy and its child nodes:

  • there is a least one child node
  • the ID of the current node is a prefix of each child node ID (for example, 2.3 –> 2.3.1)
  • the number of the table rows representing the current node = the sum of the rows for the child nodes + 1 row for the current node

These properties help to ensure that the test data worksheets are ‘well formed’ – with no extra or missing rows, and with proper hierarchical nesting. You can see an example of the last property in this screen shot:


  • test condition 101 is represented by rows 3:9 = 7 rows
  • test scenario 101.1 is represented by rows 4:6 = 3 rows
  • test scenario 101.2 is represented by rows 7:9 = 3 rows
  • so 7 = 3 + 3 + 1 row for the test condition itself

The Eiffel language has full support for invariants and other assertions, and it would be the job of the class constructor to establish the invariant. With VBA, we have to take a less rigorous approach, and have the invariant as a normal Boolean-returning function, which should return True any time after the Build procedure has been called. In Eiffel, failure of an invariant or other assertion raises an exception; in VBA, we’ll just put up an error MsgBox.

Here’s the invariant function for a test condition (level #2 in the hierarchy):

Function Invariant() As Boolean
    Invariant = _
            tcond_tid.IsCondition And _
            tcond_scenarios.Count > 0 And _
            tcond_scenarios(1).ScenNum = "1" And _
    If Not Invariant Then MsgBox "Problem with Condition " & Me.Tid

    Dim ts As TScen
    For Each ts In tcond_scenarios
        Invariant = Invariant And IsChild(ts.Tid, Me.Tid)
        If Not Invariant Then MsgBox ts.Tid & " is not a child number of " & Me.Tid
        Invariant = Invariant And ts.Invariant
End Function

The first part checks:

  • the Condition’s ID is at the right level for a Condition
  • there is at least one child Scenario, and that the numbering starts from 1
  • that the child Scenarios cover all of the Condition’s rows (minus 1)

The second part iterates over the child Scenarios:

  • checking that the child IDs extend the Condition’s ID (there’s an IsChild relationship between the two IDs)
  • recursively evaluating the invariants for the child Scenarios (and thence recursively down the whole structure)

The ChildrenPartitionRows function is a simple iteration over the child Scenarios:

Function ChildrenPartitionRows() As Boolean
        'Do the rows of my children + 1 = my rows?
    Dim childrenrows As Integer
    Dim ts As TScen
    For Each ts In tcond_scens
        childrenrows = childrenrows + ts.Rowcount
    ChildrenPartitionRows = (childrenrows = tcond_tblrowcount - 1)

End Function

It might seem that we’re making extra work in writing these invariant and helper functions, but it’s a useful exercise in understanding how our worksheet tables represent a hierarchical structure, and what rules this implies for the worksheet/table layout. Further, we can make the checking directly accessible to a user via a Command Button: the handler simply builds the structure and calls the top-level invariant (on the Test Definition):

Private Sub CheckCommand_Click()
        'Build the TDefn, then evaluate its Invariant (and recursively down the structure)

    Dim td As TDefn
    Set td = New TDefn
    If td.Invariant Then
        MsgBox "Check Complete"
        MsgBox "Some problems were found"
    End If
End Sub

Note that we’re not being very informative about the precise location of an invariant failure. However, it would be simple enough to record an invariant failure on the object concerned, and then interrogate the structure subsequently to find out the ID of that object.

Finally, as an example of ‘plain text’ output (as opposed to XML), we can have objects provide a simple text representation of themselves – say just ID and Description, with vbTab and vbCrLf. This can be displayed in a TextBox (with property MultiLine = True), to give us a simple Table of Contents. Done within the handler above, this would be:

Worksheets("Front").TocText.value = td.Contents

For a simple one-level ToC, the Test Definition would simply iterate over the child Conditions, concatenating their individual Contents strings.


Exporting XML From Excel Tables – 2: Object Model

In the previous post, we were looking at how to export a hierarchical structure as XML, from a set of Excel tables, where the rows are hierarchically numbered (1, 1.2, 1.2.3, etc). The approach in our VBA is to build a hierarchical object model from our worksheet tables, and then ask the root of the model for its XML string. Both the building and the string generation proceed recursively down the structure. Note that this approach works for any hierarchical data – in our case it is software test definitions.

There are six levels in our hierarchy. #1 is represented by an entire workbook; #2 to #5 are represented by subsets of (contiguous) table rows; #6 is represented by a single cell in a table row (plus its associated header information). There is a VBA Class Module for levels #1-5.

Level #6 corresponds to a test parameter – a single cell, plus its associated column header. The parameters don’t have their own class, as they are really just a couple of values (data, header). The level #5 class holds this information as arrays (corresponding to the relevant data row and the header row).

The multi-level numerical ID is complex enough to warrant a VBA Class Module of its own. This allows us to do things like asking an ID what level it’s at, or asking for a new ID for its Nth child.

Each of the five classes representing a test level has attributes (Private) to record:

  • the chunk of worksheet to which an instance maps: the table (ListObject), the row within the table, and the row count (i.e. the size of the table chunk)
  • its ID (TestID object)
  • data that applies to that level, such as a description, category and mode
  • a Collection of child objects, representing the next level down

Each of the five classes has a Build method (Sub), which is called by its parent object (or the user interface, for the top level). The Build method takes as parameters:

  • its parent’s ID
  • the chunk of worksheet to which an instance maps: table (ListObject), row within the table, and row count (i.e. the size of the table chunk)

The Build method:

  • stores its parameters
  • gets and stores its table-row information
  • creates a new ID for itself (derived from its parent’s ID)
  • calls a method to BuildChildren

The BuildChildren method scans through the table chunk, finding the sub-chunks corresponding to each of its children. For each sub-chunk, it creates a child object of the appropriate class, and asks the child to Build itself, given the appropriate parameters (see above). Determining which level a particular row represents is just a case of seeing which of its first four cells have numbers, and which blanks. There are utility functions for doing this.

A couple of observations regarding the ‘object oriented-ness’ of this. Firstly, in a fully OO language, the Build method would probably be the constructor for an object. The is preferable because it removes the possibility of having ‘not yet built’ objects. However, in VBA, Sub Class_Initialize can not take parameters, so we need a separate method called immediately after the object creation (Set … New …). Secondly, in a fully OO language, one would be tempted to to abstract some of the structure and processing into an abstract class – say, TestNode. However, our structure is not arbitrarily deep – we’re not going to have 20 levels. Also, the top and bottom levels are special cases – actually, only levels #2 and #3 are substantially the same.

Having built an object model, generating the XML is then very easy. Each of the five classes representing a test level has an AsXml function. Here’s the top-level one, on Test Definition:

Function AsXml() As String
        'Element <tdefn>

    AsXml = "<tdefn id='" & tdefn_id & "'>" & vbCrLf & _
                DescAsXml() & _
                ConditionsAsXml() & _

End Function

Function ConditionsAsXml() As String
        'Element <tcond>*

    ConditionsAsXml = ""
    Dim tc As tcond
    For Each tc In tdefn_conds
        ConditionsAsXml = ConditionsAsXml & tc.AsXml & vbCrLf

End Function

The second function shows the recursion of AsXml to the child level (here, Test Condition). At other levels, there’s a little extra complexity to allow for optionally outputting secondary information, such as Descriptions.

The top-level XML String is then written out to a file via a TextStream (MS Scripting library). This is wrapped in a function XmlOut.

There is a simple ‘user interface’ worksheet, with a Command Button:

Private Sub ExportCommand_Click()
        'Build the TDefn and write its AsXml string to file

    Dim td As TDefn
    Set td = New TDefn
    XmlOut td.AsXml

The same kind of recursion down our hierarchical object structure can be used to make the structure self-checking (up to a point). I’ll talk about this in the next post.

Moving Data Out of Outlook – part 2

Happy and code-full 2011!

In an earlier post, I was considering exporting my Outlook contacts as XHTML pages, and accessing these as a repository of free-format, but queryable, documents. After some time out for Christmas and New Year, not to mention my family’s seasonal cough-fest, I can report back.

I haven’t attempted a comprehensive export of all contact attributes, just the standard ones that I typically use. This still leaves information in the Contact’s body section, but I’ll migrate that manually, as and when.

Here’s an example contact page:

I just grabbed a standard style sheet, so the formatting is not up to much (for example, the cell text middle-aligns vertically, as you can see with “Address”).

In terms of the VBA code, all we need is an adapter class, ContactPage. Instances of this are created for each ContactItem that we export. A ContactPage is responsible for building the XHTML string corresponding to the ContactItem, and writing it out to a text file. Here’s the root procedure:

Const outdir As String = "C:\Temp\Contacts"

Sub ExportContacts()
  Dim allcontacts As Items
  Set allcontacts = Session.Folders("Personal Folders"). _

  MsgBox "Exporting " & allcontacts.Count & " Contacts"

  Dim cn As ContactItem  'Outlook class
  Dim cp As ContactPage  'Custom class
  For Each cn In allcontacts
    Set cp = New ContactPage
    Set cp.Contact = cn    'Property Set
    cp.OutputToFile outdir
End Sub

ContactPage has a function for PageTitle, which is taken from the ContactItem‘s Company or Full Name (in that order of preference). Functions PageId and PageFile are based on this (with a bit of cleaning up to remove characters such as ‘/’).

The sub OutputToFile uses the Microsoft Scripting Runtime library, specifically FileSystemObject and TextStream. You need to load the Scripting library manually, via >Tools >References.

The rest of ContactPage’s features are just functions that generate elements in the XHTML hierarchy. For example:

Public Function XhtmlPage() As String
'The XHTML representation of cp_contact
    XhtmlPage = Header_ & vbCrLf & _
End Function


Public Function Html_() As String
    Html_ = "<html id=" & Chr(34) & PageId & Chr(34) '& ...
    Html_ = Html_ & vbCrLf & _
            Head_ & vbCrLf & _
            Body_ & vbCrLf & _
End Function

Public Function Body_() As String
    Body_ = "<body>" & vbCrLf & _
            "<h1>" & PageTitle & "</h1>" & vbCrLf & _
            Company_table_ & vbCrLf & _
            Person_table_(1) & vbCrLf & _
            Person_table_(2) & vbCrLf & _
            Person_table_(3) & vbCrLf & _
            Categories_ul_ & vbCrLf & _
End Function


Public Function LastName_td_() As String
    LastName_td_ = "<td class=" & Chr(34) & "LastName" & Chr(34) & ">" & _
     cp_contact.LastName & "</td>"
End Function

The lowest level functions, typically for <td> elements, access the ContactItem‘s properties. As you can see, there are a lot of quotation characters (Chr(34)) in XHTML. The trailing-underscore convention for the function names was so that I could call the functions after the XHTML tags they generate – without the underscore I got a name clash with something existing.

Minor complications where the Contacts folder contains Distribution Lists (class DistListItem), which is a different class from ContactItem – that is, a MAPIFolder.Items is a heterogeneous Collection. This makes the For Each loop throw a type mismatch error, although you can resume execution (>Debug, >Run).

Also a slight issue with duplicate filenames, since the file output overwrites by default. I used Round(Rnd * 1000, 0) to generate a temporary disambiguating suffiix for the filename.

Here’s the Class Module for ContactPage.


Diff for Access Tables

I haven’t had time to progress the Outlook Contact Export, so here is a minor digression.

Suppose that I have two Access tables each containing records that represent the same entities – Orders, say. These two sets of equivalent records come from different sources: maybe Orders as received, and Orders as dispatched. I want to cross-check certain fields for equivalent records – say, that order-quantity received and order-quantity dispatched are the same – and flag any mismatches.

Here are two tables:

I want to check the equivalence of columns B and C. There are differences in B/C for the second and fourth records, and an extra record in the second table.

Note that the IDs are counters, but are not necessarily the same, so I can not do a Join of the two tables. The tables are assumed to represent equivalent sets of records – maybe for the same period – so they could be queries on underlying cumulative tables.

It’s not apparent to me how you could do this using queries. If we could join the tables together, then for numerical values I suppose that we could subtract one field from the other, and check for non-zero differences.

However, a more generic solution is to suck the relevant columns into two Recordsets, using a simple SELECT query as the Source, and then iterate through records and fields, comparing values. Differences can be logged in some way:

  • If we expect only rare differences, then we could build up a log String, and display this at the end of the check:
  • If we expect substantial differences, we could base an output Recordset on a temporary table, and write message-records into it.

I’ve adopted the first approach, for simplicity.

Here’s a code module, and the database (2003 .mdb) with test tables. There’s no user interface: RunDiff is run directly from within VBA.

Moving Data Out of Outlook

I’ve felt uneasy for a while that important data – for example, my Contacts – is sitting in Outlook, in an enormous and opaque database file (.pst). Okay, it’s backed up, but if it became ‘corrupted’ and wouldn’t open, then I have no way of recovering the data, that I know of. Why should I have to open Outlook, just to find someone’s website, or phone number? The whole idea of organizing your life within a single application seems terribly outmoded. I just want a mail client that handles message send and receipt; all the information I want to be in the ‘outside world’.

Retrieving data can be pretty painful too. I’ve just clicked on the ‘Activities’ tab of a contact, and it’s taken over 2 minutes to come back with about 20 items. While there is free-text search, there is no proper query access.

Outlook’s data model is baroquely large – a futile attempt to cover all possibilities. I’m looking at the class ContactItem, and it has nearly 150 attributes. I can have NickName, TelexNumber (telex?) and YomiFirstName (something Japanese); but I can only have three email addresses and one webiste URL.

So what’s the alternative? What I’d like is for each item (contact, whatever) to be a web page in a local (private) website. New pages could be created, edited and added as necessary. The information is all fully accessible using basic or more sophisticated editors. Website and mailto links work normally – the latter opening a mail client, as usual.

As it happens I’m using a document repository for XML documents, including those in XHTML. This runs on a Jetty server, at a localhost port. The repository can be browsed as a directory hierarchy, free-text searched, and queried (using XQuery). Because the repository is fully indexed, the search and query are very fast. I can create/edit documents in the repository directly, without the need to export/import, using a WebDAV-enabled editor.

So, to populate the repository initially, I would need to export my existing contacts from Outlook in some way. Exporting items as XML or HTML directly is beyond Outlook (up to and including v2007). So I’m thinking that I could write a custom export to XHTML – nothing complicated, just a simple page with tables. Elements, such as the table cells, would be given standard id attributes to indicate the content type – such as ‘company’, ‘address’; these can then be used in queries, analogously to column names in SQL.

In terms of the design, I’m wondering about a custom class, say ContactAdapter, which would wrap a ContactItem, and build the complete XHTML string, and write it to a file (using the Scripting library for the file access). I’ve not done anything like this before, so we’ll see. I suppose that I could export the data to Excel, and work on it there, but dealing with rows and columns is less direct than working with the actual ContactItem objects in Outlook.

More anon…

Period Overlap Function

Over at Daily Dose of Excel, Dick provided a solution to the problem of counting active entities (e.g. customers) within a given period. That is, we want to count a customer if their period of activity overlaps with the given period. The key to the solution is SUMPRODUCT.

An alternative solution, which is a bit more work, but is more flexible, is to write a custom VBA function to compare two periods: that is, start1-finish1 versus start2-finish2.

It is then easy to add an extra column:

PFirst and PLast specify the period of interest. The formula in D6 is:


You might be tempted to write [First] and [Last], rather than B6 and C6. Unfortunately the implicit intersection of formula row with a table column does not work with the custom function – the function gets arrays of values from the whole columns.

However, we could have the OVERLAPS formulas off to the right of the table, say in column H. The formula can then be:

Table1[[#This Row],[First]],
Table1[[#This Row],[Last]])

(Yes, I should have given the table a meaningful name…)

The formula in D3 is just:


We can also use the OVERLAPS function to provide conditional formatting – here in column A (see screenshot above). The formula is exactly the same as in column D:


Finally, here’s the code for the function:

Public Function OVERLAPS(start1, finish1, start2, finish2) As Variant
'Does the period start1-finish1 overlap
'with the period start2-finish2?
'pre: start1 <= finish1 and start2 <= finish2 

  If IsDate(start1) And IsDate(finish1) And _
     IsDate(start2) And IsDate(finish2) Then
        OVERLAPS = _
        (start2 >= start1 And start2 <= finish1) Or _
        (finish2 >= start1 And finish2 <= finish1)
        OVERLAPS = CVErr(2001)
  End If
End Function

The error appears on the worksheet as a #VALUE, rather than a #N/A – I didn’t look into this deeply.

Navigating Part Relationships – 2

In the previous post, I introduced the idea of a cursor object that allows you to navigate around a graph of component-part relationships:

Navigation could be:

  • down the ‘contains’ relationships
  • along the ‘used in’ relationships
  • back through the history of visited records.

We’ll also have a Reset operation, which jumps back to the start of the table, and clears the history.

The navigation is done using keyboard shortcuts (but could be done via a form).

The core of the design is a Class Module GraphCursor. This provides our four operations: CursorDown, CursorNextUse, CursorBack and CursorReset. When an instance of this class initializes, it points itself at ListObjects(1) on the ActiveSheet (there is only one sheet, to keep things simple), and does a CursorReset.

A GraphCursor maintains a history of visted components using a linked List class (a simple chain of ListItem objects – nothing to do with ListObject a.k.a. Table).

CursorDown and CursorNextUse use Range.Find with the currently selected cell value. I assume this is pretty efficient – and in any case is neater in code terms than iterating through rows explicitly. The Range for CursorDown is just the first column (Component); the Range for CursorNextUse is the Part columns below the row of the current selection.

Something needs to create and hold on to an instance of GraphCursor – this is a standard module Client. This also provides procedures that are called via the keyboard shortcuts.

Public gc As New GraphCursor

Sub GCDown()
End Sub

'similarly for the other three operations

The keyboard shortcuts are set up on Workbook_Open:

Private Sub Workbook_Open()
    Application.OnKey "^+d", "GCDown"
    Application.OnKey "^+n", "GCNextUse"
    Application.OnKey "^+r", "GCReset"
    Application.OnKey "^+b", "GCBack"

End Sub

Here’s the workbook (Excel 2007).

Since each navigation step is worked out dynamically, we can insert or delete records from our table as we like. This would not be the case for an indexed solution (maintaining a map of Component to row number).

You could argue that each Component-Part relationship should be a separate record – for example, [A, B], [A, C]. This would allow us to associate quantities or other information with each relationship. In this case, we would also need a CursorNextPart operation.

July 2018
« Dec