Posts Tagged 'VBA'

Multi-Area Ranges

We tend to think of a Range as a contiguous set of cells: a single cell, part (or all) of a column, part (or all) of a row, or a rectangular area. However, a Range is one or more or these. The contiguous areas of a Range are held in its Areas collection. If Areas.Count > 1 then we have a multi-area Range.

To construct a multi-area Range from several single-area ranges, you can use the Union operation, which is provided by the Application (so, effectively, global).

I came across an example of this in customizing a chart to show only the most recent N records in the source data. Here’s an accumulating table of monthly records:

I want a clustered column chart, with Months as the category (X) axis, and In/Out as the series.

If I wanted data from all records, I would set the chart’s source data range to be the entire table, including the headers, which Excel interprets to give the series names (that is, In and Out).

If I want my chart to show only the most recent 3 months (say), then I need to specify the source data range as the union of A1:C1 and A6:C8. Here’s the Source Data Range as it appears in the chart configuration:

=’Data 2010′!$A$1:$C$1,’Data 2010′!$A$6:$C$8

Here’s a procedure that constructs and sets the range for a given number of months:

Sub LastNMonths(nmonths As Integer)
'pre: 0 < nmonths
'  <= Range("A1").CurrentRegion.Rows.Count - 1

    Dim ws As Worksheet
    Dim firstrow As Integer, lastrow As Integer

    Set ws = Worksheets("Data 2010")

    lastrow = ws.Range("A1").CurrentRegion.Rows.Count
    firstrow = lastrow - nmonths + 1

    Dim headers As Range
    Set headers = ws.Range("A1:C1")

    Dim rng As Range
    Set rng = ws.Range("A" & firstrow & ":C" & lastrow)

    Set rng = Union(headers, rng)

    Dim cht As Chart
    Set cht = Charts("Chart 2010")
    With cht
        .SetSourceData rng
        .ChartTitle.Text = _
               MonthsTitle(ws, firstrow, lastrow)
    End With
End Sub

MonthsTitle simply constructs the chart title from the first and last months (from column A) – for example, “Report May-10 to Jul-10”.

As I did this initially in v2003, I have not made use of the ‘table-ness’ of the data – hence the Range(“A1”).CurrentRegion.

The procedure was called from a ‘Refresh Chart’ command button on a worksheet, with the nmonths value provided via an InputBox.

Before anyone points this out, let me say that you can do a certain amount of this kind of thing using formulas in the Chart configuration (using OFFSET, etc) – see Walkenbach’s ‘Formulas’ book. However, I think it’s clearer, and certainly more flexible to do it in VBA.


Formula Builder

Suppose that we have a large number of numerical data sets, which we want to examine using aggregation functions, such as SUM, AVERAGE, STDEV, and possibly some custom functions of our own devising. Each of these functions takes an entire dataset and retuns a single result.

Here’s the data:

where ‘data1’, ‘data2’, etc, name the dataset ranges; the column headers are themselves named ‘datasets’. We could, of course, have larger and more numerous datasets.

Now, we could have another worksheet with formulas such as:

=SUM(data1), =SUM(data2), …

= STDEV(data1), = STDEV(data2), …

and so on, for each combination of function and dataset. However, that’s rather a sledgehammer approach.

Alternatively, we could select a function from a drop-down list, select a dataset from another list, and so build a particular formula of interest:

The list of available functions is defined elsewhere, and picked up by data validation in cell B1:

(SUMDIFFS is a sample custom function). Similarly, the data validation on cell D1 picks up the ‘datasets’ list (the data column headers).

The result cell (F1) contains the formula:


where CALC is a public VBA function:

Public Function CALC() As Variant

  Dim func As String
  Dim dataset As String
  Dim formula As String

  func = Range("function").Value
  dataset = Range("dataset").Value
  formula = func & "(" & dataset & ")"

  CALC = Evaluate(formula)
End Function

The formula String does not need an initial ‘=’.

As CALC does not reference its input cells explicitly, it does not get recalculated automatically if the input cells change. So we need to handle Change events that emanate from the two input cells. In the ‘Analysis’ Sheet module:

Private Sub Worksheet_Change(ByVal Target As Range)

  If Target.Address = Range("function").Address Or _
     Target.Address = Range("dataset").Address Then
  End If
End Sub

I’ve always wanted use the ‘Dirty’ method … This is equivalent to re-entering the =CALC() formula, triggering recalculation.

A constraint here is that the functions must form a family that have the same argument signature: here, a single array of values. Other statistical functions take more arguments, and might form other families. Alternatively, a multi-argument built-in function could be wrapped in a custom function that defaulted all but the main ‘data’ argument.

Handling Worksheet_Change for a Named Value

I was recently thinking about an example of handling the Worksheet_Change event. Unlike, say, opening a Workbook or clicking an ‘OK’ button, this is a ‘background’ event, which happens many times during an editing session. So we need to be quite specific about which changes we’re interested in. An obvious test is to check whether the name of the target Range is relevant.

An example that springs to mind is keeping a log of changes to some key value: something like a tax rate, where the act of changing the value is itself of interest. It is also important that transaction records pick up the appropriate rate for their date, not the latest value.

Here is a simple Orders table:

The various Orders pick up their tax rate from a Rate table:

The Rate column in the Orders table has the formula:


When the Tax value (top-right of the first screenshot) is edited, a new row is added to the Rate table.

We can do this by handling Worksheet_Change on the Orders Worksheet:

Private Sub Worksheet_Change(ByVal Target As Range)
'If Target is the named Range "Tax",
'then extend the change log
'on Worksheet "Tax Rates",
'with today's date and the new value.

The easiest (only?) way of checking the Range Name is to have a go, and handle the error if it is unnamed:

    On Error GoTo finish
            'If Target does not have a Name,
            'then Target.Name raises an error
            'So just ignore the event.
    Dim rngname As String   'Possible name of Target
    rngname = Target.Name.Name

If Target does have a name, and that name is ‘Tax’, then add a row to the Rate table:

    If rngname = "Tax" Then
        Dim newrow As ListRow

        Set newrow = _
        Worksheets("Tax Rates").ListObjects(1).ListRows.Add

        newrow.Range.Item(1).Value = Date
        newrow.Range.Item(2).Value = Target.Value

    End If
End Sub

The use of the Date function ensures that the first column of the Rate table is sorted ascending, which is required by the VLOOKUP. If you make more than one change of Rate on a given date (for example, you correct a mistake), the VLOOKUP picks up the latest value. So if today (24/6) I make a further change from 20% to 21% (I blame the fiscal squeeze), then this rate is picked up by today’s orders.

Sub-sequence Iterator 3

In the previous post, we were looking at an iterator object that bound to successive sub-sequences of records in a table, according to a StartCondition and an EndCondition. In that example, the sub-sequences were contiguous, non-overlapping, and so could be summarized to give sub-totals in a very flexible way.

A slightly different example is where the start and end records for each sub-sequence are all intermixed. Here’s a time-ordered sequence of start and finish actions, for some identified entities:

You can see that B and C both start and finish before A finishes.

Suppose that want to find out the duration of each activity. We then need to find matching start and finish records, and calculate the difference between the two dates. In this case, the sub-sequence is the ‘lifetime’ of the entity. Although we are not interested in anything other than the start and finish for an entity, it’s conceivable in other examples that we might be interested in the intermediate values.

In the previous example, the sub-sequences were non-overlapping, so the MoveNext operation started looking for a new StartRow immediately after the old EndRow. Here, the sub-sequences can overlap, so the MoveNext operation starts looking for a new StartRow immediately after the old StartRow . At present, each entity is assumed to have a finish record.

StartCondition simply looks for a ‘start’ record; when it finds one, it records the Id. EndCondition looks for a ‘finish’ record with the current Id.

The client code creates an iterator on the Actions table (see above), moves it through the table, calculates the duration for each entity (finish – start + 1), and writes a summary record:

Here’s the client procedure:

Sub Run()
    Dim ssi As SSI2
    Set ssi = New SSI2
    With ssi
        .WsName = "Actions"    'Worksheet
        .TableName = "ActionTable"
        .KeyName1 = "Id"       'Key column
        .KeyName2 = "Action"   'there are 2 key columns
        .ValName = "Date"      'Value column
    End With

    Worksheets("Summary2").Activate  'for the output

    Dim row As Long
    row = 1
    ssi.MoveNext     'to first sub-sequence
    Do Until ssi.IsAfter
        Range("A" & row).Value = ssi.Label
        Range("B" & row).Value = Summarized(ssi.values)
        row = row + 1
End Sub

Private Function Summarized(values As Variant) As Variant
'pre: IsArray(values)
    Dim last As Long
    last = UBound(values)
    Summarized = values(last) - values(1) + 1
End Function

I think this is quite an interesting example, as it’s not obvious to me how you would do it with formulas, even array ones.

At present there are two different classes for the two examples (non-overlapping and overlapping) – hence SSI2, above. Maybe they should both implement a common interface. In practice, though I think you’d know which flavour you needed. Here’s the workbook with both examples, if you want to try it out.

Sub-sequence Iterator 2

Following on from the previous post, I’ve had a go at implementing a Sub-sequence Iterator (SSI) class, which operates on a named Table ( a 2007 table, a.k.a. ListObject). There are two variants, depending on whether the sub-sequences can overlap or not.

The first case is similar to the earlier data partitioning, where we want to break our table rows into contiguous, non-overlapping sub-sequences, and perform some summarizing operation over the values – for example, subtotalling. Here’s the data table:

The Date column is our ‘key’, by which we work out the sub-sequences. In this case it’s just a weekly incrementing date. The Value column contains the data that we want to summarize. For simplicity, let’s say that we want to summarize by month – although it could be something more interesting.

Our SSI object has a MoveNext operation, called by some client code, which makes it iterate through the sub-sequences. For each sub-sequence, the SSI makes available to the client code:

  • a Label that can be used to identify each sub-sequence
  • an array of Values for the sub-sequence.

The client code can then pass the Values to a function Summarized, which in this case calculates a (sub)total. The client code then writes the Label and the Summarized value to another worksheet:

Here, the Label is just “Month ” prepended to the month number (January -> 1, etc).

Now, you might be wondering why we don’t just add a Month column to our table, and generate a pivot table, aggregating by month.

Firstly, we might want to partition our dates in many different ways: by month, quarter, Mayan Lunar Year, and so on. It would be cumbersome to have to add a column, with appropriate values, for each of these partitionings. Indeed, our table might be linked to some external data source which has only the raw data.

Secondly, a particular partitioning might depend on the data values themselves (for example, a negative value terminates a sub-sequence), or some dynamic value, such as today’s date.

To customize the SSI class for a particular table, we need to write:

  • Function StartCondition(row As Long) As Boolean
  • Function EndCondition(row As Long) As Boolean
  • Function Label() As String

In this example, StartCondition and EndCondition are both looking for changes in Month. When the SSI finds new rows satisfying these conditions, it sets StartRow and EndRow, which then delimit a new array of Values.

I’ll talk about the second variant – where we can have overlapping subsequences – in the next post.

Sub-sequence Cursor

In the previous post, we were partitioning rows on a worksheet into separate workbooks. If you think of this more generally, we’re identifying disjoint sub-sequences of the data records and performing some operation or calculation over each of these sub-sequences. The end of a sub-sequence is defined by a condition – in this case, a change of the key value (end.key <> endnext.key).

By a complete coincidence, with my XML hat on, I recently came across the spec for XQuery 1.1 (this is a draft spec for the next version, so it’s not implemented yet). XQuery is a functional query language for XML documents – roughly equivalent to SQL, but much cleaner and better structured.

Basically, you identify ‘tuple streams’ using the XPath pattern language. A tuple can be a single value, a list of values, or a node-tree, but we can see these as analogous to rows on a worksheet. Normally, our XQuery uses a ‘For Each’-like construct (called a FLWOR – pronounced ‘flower’) to iterate through the tuples one at a time. However in v1.1 they have introduced a Window Clause, which is essentially a For Each Subsequence iteration, with the start and end defined by test conditions. (So ‘window’ means sub-sequence, here).

In addition, you can specify whether the windows are allowed to overlap or not. In a ‘tumbling’ window, start(n+1) must be after end(n). In a ‘sliding’ window, start(n+1) must be after start(n), but could be before end(n). In our data partitioning case, our windows were ‘tumbling’ (and, as it happens, contiguous).

Anyway, it struck me that we could implement something similar for Excel, but in an object-based, rather than functional, style. That is, we could have an object of class SubsequenceCursor, which provides access to successive sub-sequences of our data records. As with all cursor objects, there would be a MoveNext operation, plus properties to access properties of the current sub-sequence. The start and end conditions would be private functions.

Such a cursor object could be used within a loop to derive totals, or other summary information, for each subsequence. If that summary information is written to another worksheet, then we have a very flexible way of doing subtotals, and such like. Since I’m always up for a bit of OO design, I think I’ll have a go.

I’m aware that I never finished the Worksheet Hierarchies stuff, but as always, something new is more interesting :-). I’m also busy with some DITA work, at present.

Partitioning Data into Multiple Workbooks

I’ll return to the worksheets-to-table collation shortly…

I was diverted by a problem that my wife came up with: how to partition data in a single workbook into multiple workbooks, depending on the value in a particular ‘key’ column. Here’s the kind of thing we’d have as the input worksheet:

The data is sorted on the first, Key, column. There could be multiple data columns.

What we want is to create a workbook AAA.xls, containing the first 7 data values; a workbook BBB.xls, containing the next 3 data values; a workbook CCC.xls, containing the last 5 data values. And so on.

Obviously the numbers of workbooks and values involved could be large – so we’re not going to do this using copy-and-paste (and this is a VBA blog, after all!). Incidentally, this is in v2003, not that it matters.

So we need some way of finding where the Key values change. We could just iterate through the cells, but that’s rather plodding. We have, in the normal user interface, the command >Edit >Go To Special >Column Differences. If we start off with A:A selected, it changes the selection to A2:A16, then to A9:A16, and so on. If we do it on the last block, with A12:A16 selected, we get an error.

The key bit of VBA is:


where ActiveCell is the first cell of the selected Range. Selection.Row is then the start of a chunk. Doing it again gives us the start of the next chunk, and thus the end of the preceding chunk.

So we can just cycle through the chunks in a Do While True loop, exiting when we get an error on the last chunk (remembering to copy that last chunk).

I added a Control worksheet with a command button, and prompted for the source worksheet name (in case there are multiple ones). I didn’t copy over the header row for columns B+, but that wouldn’t be difficult to do.

That seems to do the trick: the code module is here, if you’re interested. There are doubtless other ways of doing this, but my solution got the required Brownie points from Liz.

July 2019
« Dec