Sub-sequence Cursor

In the previous post, we were partitioning rows on a worksheet into separate workbooks. If you think of this more generally, we’re identifying disjoint sub-sequences of the data records and performing some operation or calculation over each of these sub-sequences. The end of a sub-sequence is defined by a condition – in this case, a change of the key value (end.key <> endnext.key).

By a complete coincidence, with my XML hat on, I recently came across the spec for XQuery 1.1 (this is a draft spec for the next version, so it’s not implemented yet). XQuery is a functional query language for XML documents – roughly equivalent to SQL, but much cleaner and better structured.

Basically, you identify ‘tuple streams’ using the XPath pattern language. A tuple can be a single value, a list of values, or a node-tree, but we can see these as analogous to rows on a worksheet. Normally, our XQuery uses a ‘For Each’-like construct (called a FLWOR – pronounced ‘flower’) to iterate through the tuples one at a time. However in v1.1 they have introduced a Window Clause, which is essentially a For Each Subsequence iteration, with the start and end defined by test conditions. (So ‘window’ means sub-sequence, here).

In addition, you can specify whether the windows are allowed to overlap or not. In a ‘tumbling’ window, start(n+1) must be after end(n). In a ‘sliding’ window, start(n+1) must be after start(n), but could be before end(n). In our data partitioning case, our windows were ‘tumbling’ (and, as it happens, contiguous).

Anyway, it struck me that we could implement something similar for Excel, but in an object-based, rather than functional, style. That is, we could have an object of class SubsequenceCursor, which provides access to successive sub-sequences of our data records. As with all cursor objects, there would be a MoveNext operation, plus properties to access properties of the current sub-sequence. The start and end conditions would be private functions.

Such a cursor object could be used within a loop to derive totals, or other summary information, for each subsequence. If that summary information is written to another worksheet, then we have a very flexible way of doing subtotals, and such like. Since I’m always up for a bit of OO design, I think I’ll have a go.

I’m aware that I never finished the Worksheet Hierarchies stuff, but as always, something new is more interesting :-). I’m also busy with some DITA work, at present.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

May 2010
« Apr   Jun »

%d bloggers like this: