Archive for the 'Uncategorized' Category

Back, Briefly: Exporting XML From Excel Tables

It’s been a while since I posted anything to this VBA blog, since my main field of work is XML based documentation (see previous posts). However, I recently did an Excel VBA micro-project, which I think is worth sharing. I’ll describe the problem and outline the solution in this post, and add some details in a couple of further posts.

The overall aim is to provide a way of creating and editing software test definitions, while also making those definitions accessible to a test harness for actually executing the tests. (The solution I’m going to describe is, with appropriate modifications, more widely applicable than just for test definitions).

Since the test definitions fall naturally into a tabular form, Excel is an obvious front-end tool. The problem is how to make the data accessible to a testing harness. One way is to use Apache POI, a Java API for accessing Microsoft Office files – and our developer was keen to use this. However, there are benefits to having a plain text – that is, XML – representation of the data:

  • It can be stored (alongside the Excel workbook) in a version-control repository, and different versions differenced, using a plain text or (better) XML-aware diff tool
  • A body of test definitions can be analyzed and reported on using XQuery
  • Any scripting language with an XML library can access the test data

So, the requirement is for a VBA export facility that takes the test definitions in a workbook, and exports it as a single XML file.

Now, if the test definitions were simply a list of tests, it would be very simple – we might be able to use Excel’s built-in XML facilities. However, our test definitions are quite deeply hierarchical:

  1. Test Definition (for a system component or capability)
  2. Test Condition (a functional area to be tested)
  3. Test Scenario (a particular test configuration)
  4. Test (an atomic test)
  5. Test Step (one or more steps within a Test – typically only one)
  6. Test Parameter (property, value and purpose – in, out, environment)

#1 is represented by an entire workbook; #2 to #5 are represented by subsets of (contiguous) table rows; #6 is represented by a single cell in a table row (plus its associated header information).

#2 to #5 have hierarchical, numeric IDs, spread over the first 4 columns of a table. Where (typically) a Test has only a single Step, they are represented by a single row (that is, the Step is implicit). For example:


  • 101 is a Condition,represented by rows 3:9
  • 101.2 is a Scenario,represented by rows 7:9
  • 101.2.1 is a Test, represented by row 8

(each Test here has a single implicit Step). The columns on the right-hand half of the table represent the Parameters. The greying-out of ID numbers is just conditional formatting.

The XML looks much like any XML:


The point to note is that we have a six-level hierarchical structure, with each node in the hierarchy mapping on to a chunk of Excel table.

Now, the solution, in outline. We could pass through the tables and rows, spitting out XML strings as we go. However, this approach is rather limited, as there are a number of other things we might want to do:

  • check the test definitions for structural validity
  • generate output in other formats, such as plain text (really plain, not XML)

As earlier posts have illustrated, I’m keen on building object models that ‘sit behind’ the worksheets, and provide the application functionality. This seems to be a classic application for this. Each item in our test definition hierarchy is represented by an object (of the appropriate class), and is built from a chunk of worksheet table.

We start by creating a Test Definition object, and asking it to build itself. It goes through the worksheets and tables, finding the chunks that correspond to Conditions (using the ID level). It creates Test Condition objects, tells them to build themselves from the appropriate chunks, and adds them into a Collection.

Each Test Condition object goes through its chunk, finding the sub-chunks that correspond to Scenarios (using the ID level). It creates Test Scenario objects, tells them to build themselves from the appropriate sub-chunks, and adds them into a Collection. And so on, down the levels…

Having built the object hierarchy, we can do various things with it. Primarily, we want an XML string, which we can write out to a file. We simply ask the top-level Test Definition object for its XML string. This asks each of its Test Conditions for their XML strings, concatenates them successively, and wraps the result in its own Definition-level XML element (<tdefn> – see above).

Each Test Condition asks its Test Scenarios for their XML strings, concatenates them successively, and wraps the result in its own Condition-level XML element (<tcond> – see above). And so on, down the levels… A few other elements, such as descriptions, get mixed in, as we go.

Note that the object hierarchy is built anew, each time we invoke an operation – there’s no persistence other than through the table rows. This might seem wasteful, but there’s really no performance issue.

In the next post, I’ll say a bit more about the actual code involved, and I’ll say something about the self-checking functionality in a third post.


Signing Off…

Thanks to all of you who have visited this blog in 2011, and have continued to make comments. However, I need to point out that I’m no longer actively blogging here, so this site is now just for reference. I hope you continue to find it useful.

At the start of the year I started a new  job which has taken me away from Excel, Access and VBA. As some of my later posts indicated (see the tag cloud), I am now in the world of DITA XML documentation, in both a technical and an authoring capacity. My current investigations concern:

  • storing document components in an XML database
  • categorizing document components, on multiple dimensions, using internal meta-data
  • constructing documentation build manifests (DITA maps) using high-level queries (in XQuery)

If that sounds interesting, then contact me via the About page.

Enjoy your coding, in whatever language!

Still here…

Sorry the postings have been a bit thin on the ground since the New Year. Rather unexpectedly some freelance work has developed into a permanent position. I am now in a role which is primarily focused on DITA/XML documentation, for a company that produces a Payments Processing system (Direct Debits and such like). The challenge is to make the documentation scheme fit the highly modular nature of the system, and the core product –> client customization delivery process.

While Excel/VBA will still be in the mix, it won’t be central to what I’m doing. I’ll try to carry on posting as and when, so keep dropping by.

Code Quality

An interesting essay on Code Quality by Hew Wolff, over at the Pragmatic Programmers, in their online magazine. The basic thesis is that ‘Good Code Tells the Truth’ (that’s the title). It resonated with me because a week or two back I was looking at an Access/VBA application (not one of mine!). The code in this application was, if I might be frank, dire:

  • no indentation in a lot of procedures
  • no splitting of long SQL strings over multiple lines (when I did reformat one, it ran to 8 lines)
  • no useful comments at either module/form or procedure level
  • form controls not renamed (one form had buttons called Command27, Command28, Command66, Command139 – see what I mean?)
  • obscure variable names – for example, in one procedure, U3, D3, d4, SN3, FN3
  • using 0 and -1 for False and True (why?)
  • copy-paste-and-tweak repetition (e.g. to re-initialize a form).

The piece de resistance was a SELECT-INTO query that created a column using this expression:


So, arithmetic on Ids? As far as I can tell, this converts an Id value into -1 (presumably treated as True), but leaves Null values as Null, since arithmetic expressions involving Nulls equal Null. Is this a common Access programming technique? Even if it is, couldn’t it be wrapped in a function called something like NonNullToTrue (although this could not have argument or result types more specific than Variant).

This might (or might not) be a clever trick, but not considering the possibility of other people finding it confusing indicates of a poor approach to programming. What the essay mentioned earlier is saying, and what I’m agreeing with here, is that software development is as much about communication of a solution as it is about that solution just working.

WordPress Demo

This is a demo posting, to show a friend how brilliant WordPress is!

Here’s a picture…

Happy Blogday

It’s now a year since I started this blog, and good fun it’s been too (for me at least). There’s something very satisfying about writing a post, particularly when it concludes and summarizes some piece of work. I’ve also found the blog useful for my own reference, when I’ve forgotten the details of something I did some months ago.

A few numbers; 63 posts, 139 comments. Right now, the blog’s had 15,823 visits, which I’m pretty pleased with. Not much, however, compared with John Walkenbach’s 973,794 visitors (we prostrate ourselves before Your Excel-lency). I’m also pleased with my position (which modesty prevents me from mentioning) on Googling “VBA blog”.

Perennial favorite post is ByRef and ByVal – maybe because there’s some desperate Googling going on with those as search terms. The recent posts on Performance issues seem to have attracted some interest too (thanks for the comments, guys).

What Next for Modern Man? Well, I still have to look at Andy Pope’s Ribbon tool. And I’ve been experimenting with getting XML data out of Excel – this ties in with my interest in DITA/XML documentation – and generally using Excel as a front-end. More anon.

Learning VBA

Interesting post (and comments) over at Daily Dose of Excel (referencing another post). Basically saying that people should be learning and using (a bit of) VBA, as part of their basic efficiency/productivity – assuming they use Office apps, of course. For example, you wonder how many people copy-and-paste quantities of data manually (probably making mistakes) on a regular basis, when a few lines of code would do it for them.

Unfortunately, it is part of our culture that programming is regarded as something terribly technical and obscure, not for ordinary people. Microsoft themselves are at fault for putting ever more layers of user interface in front of the application functionality. Lots of PC users do not know what a Command Prompt is. I admit that VBA as a language is a bit quirky and inconsistent (the Set Keyword, for example), but it’s not unusable.

There’s a similar issue over in the domain of XML documentation. There are technical writers (‘technical’, note) who feel that everything should be entirely wrapped up in a GUI. This might be feasible if you are prepared to do what the tools let you do, and nothing else. But I think having some knowledge of the underlying languages is helpful.

It also allows you to use the many free tools, which tend to be light on GUIs. For example, a set of XML document files on disk can be queried (and even updated) using the XQuery language, effectively turning the files into a content database. XQuery is a very clean, functional language, with a SELECT-like construct to do the querying (much easier than VBA!). There are free (versions of) tools that index files and run queries.

So, yes, I think that there should be greater encouragement for ‘serious’ users of tools to roll their sleeves up and get into some (gentle or not so gentle) programming. At least they’d appreciate a bit more what developers do!

July 2018
« Dec