Archive for May, 2010

Sub-sequence Cursor

In the previous post, we were partitioning rows on a worksheet into separate workbooks. If you think of this more generally, we’re identifying disjoint sub-sequences of the data records and performing some operation or calculation over each of these sub-sequences. The end of a sub-sequence is defined by a condition – in this case, a change of the key value (end.key <> endnext.key).

By a complete coincidence, with my XML hat on, I recently came across the spec for XQuery 1.1 (this is a draft spec for the next version, so it’s not implemented yet). XQuery is a functional query language for XML documents – roughly equivalent to SQL, but much cleaner and better structured.

Basically, you identify ‘tuple streams’ using the XPath pattern language. A tuple can be a single value, a list of values, or a node-tree, but we can see these as analogous to rows on a worksheet. Normally, our XQuery uses a ‘For Each’-like construct (called a FLWOR – pronounced ‘flower’) to iterate through the tuples one at a time. However in v1.1 they have introduced a Window Clause, which is essentially a For Each Subsequence iteration, with the start and end defined by test conditions. (So ‘window’ means sub-sequence, here).

In addition, you can specify whether the windows are allowed to overlap or not. In a ‘tumbling’ window, start(n+1) must be after end(n). In a ‘sliding’ window, start(n+1) must be after start(n), but could be before end(n). In our data partitioning case, our windows were ‘tumbling’ (and, as it happens, contiguous).

Anyway, it struck me that we could implement something similar for Excel, but in an object-based, rather than functional, style. That is, we could have an object of class SubsequenceCursor, which provides access to successive sub-sequences of our data records. As with all cursor objects, there would be a MoveNext operation, plus properties to access properties of the current sub-sequence. The start and end conditions would be private functions.

Such a cursor object could be used within a loop to derive totals, or other summary information, for each subsequence. If that summary information is written to another worksheet, then we have a very flexible way of doing subtotals, and such like. Since I’m always up for a bit of OO design, I think I’ll have a go.

I’m aware that I never finished the Worksheet Hierarchies stuff, but as always, something new is more interesting :-). I’m also busy with some DITA work, at present.


Partitioning Data into Multiple Workbooks

I’ll return to the worksheets-to-table collation shortly…

I was diverted by a problem that my wife came up with: how to partition data in a single workbook into multiple workbooks, depending on the value in a particular ‘key’ column. Here’s the kind of thing we’d have as the input worksheet:

The data is sorted on the first, Key, column. There could be multiple data columns.

What we want is to create a workbook AAA.xls, containing the first 7 data values; a workbook BBB.xls, containing the next 3 data values; a workbook CCC.xls, containing the last 5 data values. And so on.

Obviously the numbers of workbooks and values involved could be large – so we’re not going to do this using copy-and-paste (and this is a VBA blog, after all!). Incidentally, this is in v2003, not that it matters.

So we need some way of finding where the Key values change. We could just iterate through the cells, but that’s rather plodding. We have, in the normal user interface, the command >Edit >Go To Special >Column Differences. If we start off with A:A selected, it changes the selection to A2:A16, then to A9:A16, and so on. If we do it on the last block, with A12:A16 selected, we get an error.

The key bit of VBA is:


where ActiveCell is the first cell of the selected Range. Selection.Row is then the start of a chunk. Doing it again gives us the start of the next chunk, and thus the end of the preceding chunk.

So we can just cycle through the chunks in a Do While True loop, exiting when we get an error on the last chunk (remembering to copy that last chunk).

I added a Control worksheet with a command button, and prompted for the source worksheet name (in case there are multiple ones). I didn’t copy over the header row for columns B+, but that wouldn’t be difficult to do.

That seems to do the trick: the code module is here, if you’re interested. There are doubtless other ways of doing this, but my solution got the required Brownie points from Liz.

May 2010
« Apr   Jun »