Archive for July, 2009

Domain Specific Languages – Part 1

I’ve long been a bit of a fan of the Ruby language (yes, there is more to life than VBA). I’m currently reading Metaprogramming Ruby, from the excellent Pragmatic Programmers. This is all about taking a very dynamic, data-driven view of programs, customising or extending class/object capabilities at run-time. The language design owes a lot to Smalltalk, which followed the everything-is-an-object philosophy (so classes are objects, and therefore need their own classes: metaclasses).

Anyway, in the Ruby community there’s interest in Domain Specific Languages: problem-specific ‘dialects’ of Ruby that offer higher level expressions for particular domains. Which led me to wonder what a DSL would be like in Excel/VBA.

Now, Excel has a sort of DSL in its formula language. It’s specific to the domain of functional calculation and lookup, with built-in functions for various categories. Some categories, like the financial functions, are focused on a particular domain; other categories are more general.

With VBA, we can clearly add custom functions relevant to a particular domain: financial, engineering, whatever. However, a formula typically produces a single value and puts it in a single cell, so the effect is quite small scale. While we can have functions that return arrays, these can be unwieldy, and can be very dependent on worksheet layout. Also, there are manageability limits to the complexity of formulas. So it could be difficult to use formulas alone to build and manipulate complex structures or models.

For example, there are numerous problems that require a node-link graph structure: workflow, scheduling, resource tracking, etc. Although such a structure could be represented as data on a worksheet, it would be quite cumbersome to manipulate it in this form.

An alternative approach would be treat the worksheet data as instructions, which are interpreted to build an internal (VBA) object model. Other types of instruction would be commands to manipulate the model, and queries of it in various ways. This implies that we have some sort of ‘worksheet interpreter’ (in which we might guess that the Evaluate and Run procedures will come in handy).

The example I’ve been looking at so far is what you might call ‘payroll type’ processing. There is a set of entities (employees, say), each of which needs to be processed, in order to determine a final value (their monthly salary payment). There is a set (sequence, really) of rules, which need to be applied to each entity, possibly modifying the value. For example: if the employee grade > 3 and they’ve worked more than 1 year and it’s December, pay them a bonus of £100. Handling all the exceptions and modifications is a classic problem for payroll systems.

Now, it’s not impossible to do this with formulas alone, but you’d be knee-deep in IF() functions. The intermediate values would be visible (presumably in columns, with the employees in rows), leading to a humungous worksheet. Another issue is that this type of processing needs to be done at a specific date/time (so we can work out things like length-of-service). We don’t want the calculations to be dynamic; we want a nice big ‘Run’ button.

I’ll describe the solution so far in the next posting.

Storing Arrays as Named Constants

I came across something recently that pointed out that you can save arrays of values (or indeed single values) as named constants – that is, using names. We usually think of names as referring to ranges, like this:


However, in general, they are named formulas, which can involve functions, as in the ‘dynamic named range’ trick:


or they can be literal values, such as an array:


Note the semicolons, making this a 3 x 3 array.

We could construct the array in our code, or get it from a worksheet range:

    Dim arr As Variant
    arr = rng.Value
    Names.Add somename, arr

It’s not easy to find out the row/column sizes from the array constant, so we might want to save these as named values in their own right:

    Names.Add somename + "Rows", UBound(arr, 1)
    Names.Add somename + "Cols", UBound(arr, 2)

The array constant can then be used to populate a worksheet range using an array formula:


However, getting the array constant back into a VBA array isn’t so easy. Unless there’s some trick that eludes me, the only way is to go via a worksheet range. Here’s a function that does this:

Function NamedArray(arrname As String) As Variant
    Dim nrows As Long
    Dim ncols As Long
    Dim rng As Range
    Dim arr As Variant

    nrows = Mid(Names(arrname + "Rows").Value, 2)
                         'lose the initial '='
    ncols = Mid(Names(arrname + "Cols").Value, 2)

    Set rng = Range("DD1")      'somewhere remote
    Set rng = rng.Resize(nrows, ncols)
    rng.FormulaArray = "=" + arrname
    arr = rng.Value

    NamedArray = arr
End Function

This uses the range’s FormulaArray property to suck in the values.

Having said that that’s the only way, an an alternative would be to parse the array constant and build the array in code, but that would be painful, and not easy to do generally for all types of value.

Is this actually worth it? I’m not sure. My preference is that data should be stored out on a worksheet (which could be hidden if desired), from which we can simply grab the Range.Value array. Also, there’s a limit on the size of the array constant (I assume lexically): it didn’t work (in xl2000) with approx. 8K rows.

ArrayedList Class

It’s a bit disappointing that the VBA library contains only one collection class: Collection. Obviously, that one’s there because of its use to hold application objects (Worksheets, Controls, etc). However, it’s common to need other types of collection (with a small ‘c’): various sorts of lists, for example.

One that I needed recently was a list where I could add to the end, and iterate through the entire set. Since I didn’t need to insert/delete within the list, a Linked List was unnecessary. So I went for a simple wrapping of an array: ArrayedList. (Okay, I could probably find one out there on the ‘net, but it’s more fun to write your own). The public features are:

Public Sub Add(x As Variant)

Property Get Count() As Long

Property Get Nth(i As Long) As Variant

The wrapped array is re-sized when I Add beyond the current upper bound. The scale of the re-sizing is determined by a ‘chunk’ constant. Here’s the full code:

Const al_chunk As Long = 10
Private al_arr() As Variant
Private al_lastused As Long
Private al_max As Long

Private Sub Class_Initialize()
    ReDim al_arr(1 To al_chunk)
    al_max = UBound(al_arr)
End Sub

Public Sub Add(x As Variant)
    If al_lastused = al_max Then
        ReDim Preserve al_arr(1 To al_max + al_chunk)
        al_max = UBound(al_arr)
    End If
    al_lastused = al_lastused + 1
    al_arr(al_lastused) = x
End Sub

Property Get Count() As Long
    Count = al_lastused
End Property

Property Get Nth(i As Long) As Variant
    If i <= al_lastused Then
        Nth = al_arr(i)
    End If
End Property

Note the ReDim Preserve in the Add method. Obviously there’s some performance impact for huge numbers of elements.

Note also that in the Nth property, there’s a protecting conditional (i <= al_lastused). This means that it can be called with an out-of-range index, in which case the default empty value will be returned, and the caller might have to check for this. There’s a difference here from having the condition as a pre-condition (see earlier postings). A pre-condition would mean the the caller would have to ensure a valid index, perhaps by an explicit check against the Count property. The choice depends on whether we think that returning empty values is sensible or not for this class.

Talking about assertions (sound of saddle being put on hobby-horse), here’s a class invariant:

Sub Invariant()
    If Not (al_chunk > 0 And _
            LBound(al_arr) = 1 And _
            UBound(al_arr) >= al_chunk And _
            al_max = UBound(al_arr) And _
            al_lastused <= al_max) Then
       Err.Raise vbObjectError + 513,
            "UsingClasses.SmartArray", "Invariant failed"
    End If
End Sub

Rather than having it as a Boolean function, it’s a sub that raises an error. This means that the testing code that calls Invariant will stop with an error, rather than logging the failure and continuing. Given that an invariant failure indicates something pretty fundamentally wrong, that might be the better approach.

July 2009
« Jun   Aug »