Sub-sequence Iterator 2

Following on from the previous post, I’ve had a go at implementing a Sub-sequence Iterator (SSI) class, which operates on a named Table ( a 2007 table, a.k.a. ListObject). There are two variants, depending on whether the sub-sequences can overlap or not.

The first case is similar to the earlier data partitioning, where we want to break our table rows into contiguous, non-overlapping sub-sequences, and perform some summarizing operation over the values – for example, subtotalling. Here’s the data table:

The Date column is our ‘key’, by which we work out the sub-sequences. In this case it’s just a weekly incrementing date. The Value column contains the data that we want to summarize. For simplicity, let’s say that we want to summarize by month – although it could be something more interesting.

Our SSI object has a MoveNext operation, called by some client code, which makes it iterate through the sub-sequences. For each sub-sequence, the SSI makes available to the client code:

  • a Label that can be used to identify each sub-sequence
  • an array of Values for the sub-sequence.

The client code can then pass the Values to a function Summarized, which in this case calculates a (sub)total. The client code then writes the Label and the Summarized value to another worksheet:

Here, the Label is just “Month ” prepended to the month number (January -> 1, etc).

Now, you might be wondering why we don’t just add a Month column to our table, and generate a pivot table, aggregating by month.

Firstly, we might want to partition our dates in many different ways: by month, quarter, Mayan Lunar Year, and so on. It would be cumbersome to have to add a column, with appropriate values, for each of these partitionings. Indeed, our table might be linked to some external data source which has only the raw data.

Secondly, a particular partitioning might depend on the data values themselves (for example, a negative value terminates a sub-sequence), or some dynamic value, such as today’s date.

To customize the SSI class for a particular table, we need to write:

  • Function StartCondition(row As Long) As Boolean
  • Function EndCondition(row As Long) As Boolean
  • Function Label() As String

In this example, StartCondition and EndCondition are both looking for changes in Month. When the SSI finds new rows satisfying these conditions, it sets StartRow and EndRow, which then delimit a new array of Values.

I’ll talk about the second variant – where we can have overlapping subsequences – in the next post.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

June 2010
« May   Jul »

%d bloggers like this: