Archive for the 'VBA Code' Category

Multi-Area Ranges

We tend to think of a Range as a contiguous set of cells: a single cell, part (or all) of a column, part (or all) of a row, or a rectangular area. However, a Range is one or more or these. The contiguous areas of a Range are held in its Areas collection. If Areas.Count > 1 then we have a multi-area Range.

To construct a multi-area Range from several single-area ranges, you can use the Union operation, which is provided by the Application (so, effectively, global).

I came across an example of this in customizing a chart to show only the most recent N records in the source data. Here’s an accumulating table of monthly records:

I want a clustered column chart, with Months as the category (X) axis, and In/Out as the series.

If I wanted data from all records, I would set the chart’s source data range to be the entire table, including the headers, which Excel interprets to give the series names (that is, In and Out).

If I want my chart to show only the most recent 3 months (say), then I need to specify the source data range as the union of A1:C1 and A6:C8. Here’s the Source Data Range as it appears in the chart configuration:

=’Data 2010′!$A$1:$C$1,’Data 2010′!$A$6:$C$8

Here’s a procedure that constructs and sets the range for a given number of months:

Sub LastNMonths(nmonths As Integer)
'pre: 0 < nmonths
'  <= Range("A1").CurrentRegion.Rows.Count - 1

    Dim ws As Worksheet
    Dim firstrow As Integer, lastrow As Integer

    Set ws = Worksheets("Data 2010")

    lastrow = ws.Range("A1").CurrentRegion.Rows.Count
    firstrow = lastrow - nmonths + 1

    Dim headers As Range
    Set headers = ws.Range("A1:C1")

    Dim rng As Range
    Set rng = ws.Range("A" & firstrow & ":C" & lastrow)

    Set rng = Union(headers, rng)

    Dim cht As Chart
    Set cht = Charts("Chart 2010")
    With cht
        .SetSourceData rng
        .ChartTitle.Text = _
               MonthsTitle(ws, firstrow, lastrow)
    End With
End Sub

MonthsTitle simply constructs the chart title from the first and last months (from column A) – for example, “Report May-10 to Jul-10″.

As I did this initially in v2003, I have not made use of the ‘table-ness’ of the data – hence the Range(“A1″).CurrentRegion.

The procedure was called from a ‘Refresh Chart’ command button on a worksheet, with the nmonths value provided via an InputBox.

Before anyone points this out, let me say that you can do a certain amount of this kind of thing using formulas in the Chart configuration (using OFFSET, etc) – see Walkenbach’s ‘Formulas’ book. However, I think it’s clearer, and certainly more flexible to do it in VBA.

Dynamic Combo Box List

A combo box on a form can pick up the list contents from its RowSource property. This can be the name of a worksheet range – such as “products” (not ever a literal address, please).

However, this is tricky if the size of the list changes (if the list grows, specifically). With a standard defined Name, you need to keep on redefining the Name. Alternatively, you can use the OFFSET-function trick for dynamic ranges (but that is obscure and depends on things like the number of non-data cells in the column).

In 2007 we have the Structured Reference naming into Tables, such as “ProductTable[Product]“, which returns us the data contents of the product column. On a recent course, I was asked if we could use this as a Combo box RowSource property. The good news is that we can:

Any additions to the Table are picked up when the form reopens:

You could get this to work for a multi-column list (e.g. id + name, with the id as BoundColumn). For a 2-column table, you can specify the whole data area: “ProductTable[#data]“. It doesn’t matter if the #data area has more columns than the list needs: the leftmost ones are used.

Initially, I thought I’d try this with an ActiveX combo box dropped directly on to the worksheet. However, such on-sheet controls are actually OLEObjects. Rather than a RowSource property, they have a ListFillRange. An OLEObject could be things other than a List/Combo box, in which case this property would be inapplicable. (Similarly, RowSource is actually defined on the Control class).

That’s okay, but unfortunately, OLEObject.ListFillRange does not understand structured references into tables: it needs either a regular Defined Name or (don’t) a literal address. This seems like an oversight: the OLEObject‘s Parent is the Worksheet, which holds the Table as a ListObject – so it should be possible to resolve the name.

Nevertheless, it works fine for the Form combo boxes, and I’ll try to use this technique in the future.

Recursive Functions

I’ve been experimenting with XQuery, which is a ‘pure’ functional language: you just have functions and expressions. You iterate over collections of documents/nodes using a SQL-SELECT-like construct: For-Let-Where-OrderBy-Return (FLWOR). There are also quantified expressions: Some/Every-In-Satisfies.

There are no explicit loops, but you can write a function that is recursive - that is, the function calls itself, until a terminating case is reached (e.g the empty-list case). Each recursive call adds a frame to the call stack of the implementation; when the terminating case is reached, the calls unwind and the frames are popped off the stack, and the result value is constructed. The amount of stack space then limits how long the chain of recursive calls can be (unless the implementation optimizes recursion to iteration – some functional languages do).

Anyway, I wondered how many recursive calls you can make in VBA before you run out stack space (this is in Excel 2007). Here’s a trivial recursive function.

Function Recursive(n As Long) As Long
   If n = 0 Then
      Recursive = 1
   Else
      Recursive = 1 + Recursive(n - 1)
   End If
End Function

This works up to n = 6128. When n = 6129, I get Run-time Error 28 – Out of stack space. That’s a rather higher number than I was expecting. If you change the Long to an Integer, it works up to n = 6285, so the number and types of the arguments makes a slight difference.

It’s unlikely perhaps that you would write a straightforwardly recursive function like this, where a loop would do. However, it’s just conceivable that you might get a chain of object-to-object calls, within a custom data structure, long enough to cause an error.

Performance – Functions (addendum)

Following on from an earlier post, I thought that I’d quickly (!) try a different function. Not being a statistician, I wanted something with a simple formula, but that worked on potentially very large arrays: CORREL (correlation between two data sets) is something I can get my head around.

So I set up two columns of 1 million rows: column A containing 1,2,3,4,…; column B containing 2 * A * (1+ nnn), where nnn is a small, random, positive or negative quantity. For example, row 10 contains (10, 21.26344). The column ranges are named. A single cell contains a CORREL formula – this calculates in somewhat less than 1 second.

So I thought I’d try:

correlation = WorksheetFunction.Correl(arrX,arrY)

where arrX, arrY are 1-dimensional arrays obtained from the respective column ranges (whose Values are 2-D, N x 1 arrays). The above call is made between calls to Timer, and with all Application updating switched off. So far so good.

Times for 50000 rows are either 0.031 or 0.047 seconds – precisely those values, never anything in between; the mix seems to be random. For 65536 rows, the two alternative times are 0.047 and 0.063, again precisely, and randomly mixed. That’s pretty odd…

So, whack it up to the full 1 million rows, to get a decent elapsed time? Unfortunately not. In fact, 65537 rows causes a Run-time Error 13 – Type Mismatch on the call to Correl (as above). So something is limited to 16 bits, which is of course the storage for a VBA Integer. But my code does not use Integer anywhere. The actual data values are just Variants, indices are Longs. And the worksheet formula works fine for 1 million rows.

What’s going on? Any suggestions welcome. Given that I can’t get a benchmark for a very large number of rows, there doesn’t seem much point in writing a custom VBA Correl function to compare. Hmmmm.

Performance – Functions

In a post some months ago, there was a discussion  of the relative performance of equivalent functions: built-in worksheet versus custom VBA. Dougaj4 reckoned that the VBA function for Transpose is about twice as fast as the WS function. However, in my benchmark, the difference was swamped by Range Access. So I repeated the experiment with an in-memory array, comparing

WorksheetFunction.Transpose(arrin)

with Doug’s double-For-loop implementation. I tried it for a square array of size 1000 to 3000 (so 1 to 9 million elements), with Long values (=i/j).

Results:

  • 1000^2: WS = 0.72, VBA = 0.31
  • 2000^2: WS = 4.08, VBA = 1.27
  • 3000^2: WS = 9.42, VBA = 2.84

Here’s the graph:

So in fact the difference is greater: the custom VBA function is about 3 times faster. I wonder why? The function is pretty simple algorithmically, so it’s not as if there could be wildly different implementations. And there’s no significant initial overhead for the Worksheet function call: its line would pretty much go through (0,0).

I don’t know whether one could expect a similar difference in performance for other functions – say a really meaty statistical function.

Performance – Strings

In the previous post, we looked at the performance hit on accessing worksheet ranges from VBA. As is well known, it is advisable to minimize accesses by reading/writing arrays.

Another area where warnings abound is in operations on large strings. A particular case is building up long strings by concatenation. I expect we’ve all done something like:

mystring = mystring + newpiece

The overhead here seems to be because the old mystring is copied to a new mystring, and then the newpiece appended. The copying time is presumably proportional to Len(mystring).

For ad hoc string building (a range address, or name, or an SQL string in Access) this is not significant. Where problems could occur is when the concatenation is being done inside a high-iteration loop – that is, building up a very large string out of very many pieces.

Since mystring gets longer and longer, each concatenation gets slower and slower. The total time is therefore going to be proportional to n-squared, where n is the number of concatenations (i.e. the number of pieces). This is obviously bad news.

As a simple benchmark, I concatenated a single character string 100,000 times (on my fairly basic laptop), and it took 3.4 seconds; 200,000 times took 12.9 seconds. That’s a factor of 3.8, so roughly 4.

Now, that’s quite a big string to be building – not something you do every day, I’d guess. But you might be building up a string in memory in order to avoid large numbers of file-write operations, which are presumably going to be much slower than one write of a large string.

There are two alternatives. One is is to use the Mid function as a kind of ‘deconstructor’ function on the left-hand side of an assignment (this is a bit odd). This essentially ‘patches’ a substring on top of a base string. The trick here is that you start with an all-spaces base string of the anticipated size of your output string; this can be constructed using the (fast) Space function.

mystring = Space(size)
i = 1
Do While i < size
    substr = "xxx"   'or return from a function
    lensubstr = Len(substr)
    Mid(mystring,i,lensubstr) = substr
    i = i + lensubstr
Loop

Some people advocate using the ‘$’ versions of the string functions – so, Mid$, in this case. These are typed to take and return Strings, rather than Variants, and so are supposed to be more efficient. This does not seem to be the case with my string-building benchmark: the times are the same.

For a 100,000 character string, built one character at a time, it takes 0.05 seconds; for a 200,000 character string, 0.09 seconds. This looks like it might be linear, as you would expect.

Another alternative is to build an array of substrings, and then use the Join function to do a batch-concatenation:

mystring = Join(myarray)

This is about half as fast as the Mid solution (0.09, 0.17 seconds), but is still linear. The advantage of this is that it’s easier to retrieve/modify the substrings prior to the final string-build (with the Mid solution you’d need to keep an array of substring offsets).

Being good (!) object-oriented programmers, we should really encapsulate our chosen mechanism in a StringBuilder class (I expect that there are some out there). If the strings are being written to a file, then our class could handle this as well: a kind of wrapper around a TextStream object.

Another, less significant, optimization is checking for empty strings. Most people would compare with a literal empty string:

If mystring = “”

Since there is supposed to be an overhead in doing this (many times in a loop), alternatives are:

If Len(mystring) = 0

If mystring = vbNullString

The times for 10 million tests were 0.73, 0.36, 0.72 seconds. So the Len function is twice as fast, but it doesn’t really seem something to lose sleep over.

Finally, I’ve seen recommendations that when searching for a substring inside a larger string, you should use the InStr function, rather than Like “*foo*”. Times for 10 million tests were 1.56 and 2.55 seconds. So not hugely significant, and Like is obviously more flexible.

Performance – Range Access

I’ve just been writing up some examples relating to VBA performance issues (in Excel 2007). Although a lot of this is well known, I thought it might be useful to report my results. The topics (over three posts) are:

  • moving data between worksheet ranges and VBA
  • operations on Strings
  • worksheet functions versus VBA functions.

For the first topic, I thought I’d compare ways of reading data from one worksheet, making a trivial arithmetic change (* 2), and writing to another worksheet. The size of the data array depends on what your system will stand, but after some experimentation I went for a 100 x 100 cell range.

The ‘straw man’ is the version that does it one cell at a time. Although this seems idiotic, you might not worry for a small set of data (say 100 cells); then someone comes along and uses 10000 cells.

So I get each value from:

Range(“input”).Item(i).Value

for i = 1 to 10000. And similarly, the Value of an individual cell is set in the output range.

The alternative version reads the data into a VBA array in a single operation:

tempArray = Range(“input”).Value

Then the array elements are updated inside a (double) loop, and then

outRange.Value = tempArray

Surprise, surprise, there’s quite a difference:

Straw Man 22.9 seconds; Array OfHope 0.14 seconds.

That’s a factor of 164. Which suggests that inefficient range access is going to dwarf any other performance issues (of which more tomorrow).

Multiple Selected Worksheets

Here’s a question that came up on a recent course. In Excel, suppose that you have shift-clicked or control-clicked to select multiple worksheets. Are they all Active? If so, what does ActiveSheet return? If not, where is the set of selected sheets?

The answer is that there is only one ActiveSheet: the one on the top of the stack, with the name in bold. The set of selected sheets is held by the SelectedSheets property (reasonably enough) of a Window object. This is a Sheets collection, which can contain both worksheets and (free-standing) Charts. The Workbook holds the full Sheets collection, of which a Window’s SelectedSheets is obviously a subset.

The Window objects are held both by a Workbook (usually just one) and Application (the full set for all open Workbooks). A Window holds appearance-related properties, such as scrolling and splitting. A Window does not raise any events, so there does not seem to be anything corresponding to the shift-click or control-click actions (not that I can think what you would want to do on handling them).

Storing Arrays as Named Constants

I came across something recently that pointed out that you can save arrays of values (or indeed single values) as named constants – that is, using names. We usually think of names as referring to ranges, like this:

=Sheet1!$A$1:$B$4

However, in general, they are named formulas, which can involve functions, as in the ‘dynamic named range’ trick:

=OFFSET(Sheet1!$A$1,0,0,COUNTA(Sheet1!$A:$A),1)

or they can be literal values, such as an array:

={1,2,3;4,5,6;7,8,9}

Note the semicolons, making this a 3 x 3 array.

We could construct the array in our code, or get it from a worksheet range:

    Dim arr As Variant
    arr = rng.Value
    Names.Add somename, arr

It’s not easy to find out the row/column sizes from the array constant, so we might want to save these as named values in their own right:

    Names.Add somename + "Rows", UBound(arr, 1)
    Names.Add somename + "Cols", UBound(arr, 2)

The array constant can then be used to populate a worksheet range using an array formula:

{=somename}

However, getting the array constant back into a VBA array isn’t so easy. Unless there’s some trick that eludes me, the only way is to go via a worksheet range. Here’s a function that does this:

Function NamedArray(arrname As String) As Variant
    Dim nrows As Long
    Dim ncols As Long
    Dim rng As Range
    Dim arr As Variant

    nrows = Mid(Names(arrname + "Rows").Value, 2)
                         'lose the initial '='
    ncols = Mid(Names(arrname + "Cols").Value, 2)

    Set rng = Range("DD1")      'somewhere remote
    Set rng = rng.Resize(nrows, ncols)
    rng.FormulaArray = "=" + arrname
    arr = rng.Value
    rng.Clear

    NamedArray = arr
End Function

This uses the range’s FormulaArray property to suck in the values.

Having said that that’s the only way, an an alternative would be to parse the array constant and build the array in code, but that would be painful, and not easy to do generally for all types of value.

Is this actually worth it? I’m not sure. My preference is that data should be stored out on a worksheet (which could be hidden if desired), from which we can simply grab the Range.Value array. Also, there’s a limit on the size of the array constant (I assume lexically): it didn’t work (in xl2000) with approx. 8K rows.

ArrayedList Class

It’s a bit disappointing that the VBA library contains only one collection class: Collection. Obviously, that one’s there because of its use to hold application objects (Worksheets, Controls, etc). However, it’s common to need other types of collection (with a small ‘c’): various sorts of lists, for example.

One that I needed recently was a list where I could add to the end, and iterate through the entire set. Since I didn’t need to insert/delete within the list, a Linked List was unnecessary. So I went for a simple wrapping of an array: ArrayedList. (Okay, I could probably find one out there on the ‘net, but it’s more fun to write your own). The public features are:

Public Sub Add(x As Variant)

Property Get Count() As Long

Property Get Nth(i As Long) As Variant

The wrapped array is re-sized when I Add beyond the current upper bound. The scale of the re-sizing is determined by a ‘chunk’ constant. Here’s the full code:

Const al_chunk As Long = 10
Private al_arr() As Variant
Private al_lastused As Long
Private al_max As Long

Private Sub Class_Initialize()
    ReDim al_arr(1 To al_chunk)
    al_max = UBound(al_arr)
End Sub

Public Sub Add(x As Variant)
    If al_lastused = al_max Then
        ReDim Preserve al_arr(1 To al_max + al_chunk)
        al_max = UBound(al_arr)
    End If
    al_lastused = al_lastused + 1
    al_arr(al_lastused) = x
End Sub

Property Get Count() As Long
    Count = al_lastused
End Property

Property Get Nth(i As Long) As Variant
    If i <= al_lastused Then
        Nth = al_arr(i)
    End If
End Property

Note the ReDim Preserve in the Add method. Obviously there’s some performance impact for huge numbers of elements.

Note also that in the Nth property, there’s a protecting conditional (i <= al_lastused). This means that it can be called with an out-of-range index, in which case the default empty value will be returned, and the caller might have to check for this. There’s a difference here from having the condition as a pre-condition (see earlier postings). A pre-condition would mean the the caller would have to ensure a valid index, perhaps by an explicit check against the Count property. The choice depends on whether we think that returning empty values is sensible or not for this class.

Talking about assertions (sound of saddle being put on hobby-horse), here’s a class invariant:

Sub Invariant()
    If Not (al_chunk > 0 And _
            LBound(al_arr) = 1 And _
            UBound(al_arr) >= al_chunk And _
            al_max = UBound(al_arr) And _
            al_lastused <= al_max) Then
       Err.Raise vbObjectError + 513,
            "UsingClasses.SmartArray", "Invariant failed"
    End If
End Sub

Rather than having it as a Boolean function, it’s a sub that raises an error. This means that the testing code that calls Invariant will stop with an error, rather than logging the failure and continuing. Given that an invariant failure indicates something pretty fundamentally wrong, that might be the better approach.

Next Page »


 

May 2012
M T W T F S S
« Dec    
 123456
78910111213
14151617181920
21222324252627
28293031  

Follow

Get every new post delivered to your Inbox.