Domain Specific Languages – Part 2

In the previous posting, I started musing on using Excel to host ‘domain-specific’ languages (DSLs) – that is, ways of tackling particular problem areas or types of problem. For starters, I’m looking at a ‘payroll’ type of problem, where we want to take each member of a set of entities (e.g. employees), and apply a sequence of condition-modification rules to them. There are potentially many rules for various special cases and exceptions. Using formulas alone would embed the rules in the conditional structure (IF() functions), and would be harder to read and modify.

One approach is to support the iteration and rule-application aspects in a simple (VBA) interpreter. The interpreter is generic with respect to the actual data involved. The rules themselves are expressed on a worksheet as Evaluate-able expressions.

Here are the entities (typically, a large number thereof):

Rules_entities

‘grade’ and ‘startdate’ are two (of possibly many) attributes.

Here are the rules:

Rules_rules

Column B is for the Boolean condition expressions.Note that some can be TRUE.

Columns C and D are for entity properties that are updated by some of the rules. These are effectively assignment statements. So C3 is equivalent to ‘payment = base’; C5 to ‘payment = payment + servicebonus’. The expressions can refer to entity properties (e.g. grade) and named parameters (e.g. grdincrement). The first rule adds an additional, calculated property daysemployed.

The outputs are selected properties of an entity, with the final values from application of the rule sequence:

Rules_outputs

Now the VBA side. An Entity is a class module that simply defines a Dictionary to hold the properties (i.e. as key-value pairs). Dictionary is from the Scripting library; it has a few advantages over Collection, such as an Exists(key) method. As it happens, there is only one entity at a time, so the class module isn’t strictly necessary, but it just feels better to have entities as objects.

The Interpreter has a number of iteration levels:

  • initialize and process each entity
  • for each entity, process each rule
  • for each entity-rule:
  • evaluate the condition
  • if true, update each output property
  • and at the end, put the final outputs on the Outputs worksheet.

The entity property names (i.e. the keys into the Dictionary) come from the header rows, which are retrieved at the start.

The condition and update expressions are evaluated using the Application.Evaluate function. Here’s the condition evaluation for a rule:

Sub ProcessEntityRule(ent As Entity, r As Integer)
'r is the rule number

    rulefields = _
    Range("Rules!A1").Offset(r, 0).Resize(1, _
                                    rulefieldcount).Value

    Dim condexpr As String  'condition expression
    Dim condexpr2 As String 'condition with substitutions
    Dim cond As Boolean     'condition result

    condexpr = rulefields(1, 2)
    condexpr2 = Substitute(ent, condexpr)
    cond = Evaluate(condexpr2)

    If cond Then
        Dim op As Integer
        Dim outname As String      'e.g. "payment"
        Dim outexpr As String        'e.g. "base"
        For op = 3 To rulefieldcount
            outname = ruleheaders(1, op)
            outexpr = rulefields(1, op)
            UpdateEntity ent, outname, outexpr
        Next
    End If
End Sub

Substitute replaces any property names in condexpr with the corresponding property values. This means that the names have to be distinct from each other, and from parameter names. So, for example, ‘grade’ and ‘gradeincrement’ would clash (hence ‘grdincrement’).

Here’s the entity update:

Sub UpdateEntity(ent As Entity, outname As String, _
                                 expr As String)

    Dim expr2 As String 'with substitutions
    Dim newval As Variant

    If Len(expr) > 0 Then
        expr2 = Substitute(ent, expr)
        newval = Evaluate(expr2)

        If ent.Properties.Exists(outname) Then
            ent.Properties.item(outname) = newval
        Else
            ent.Properties.Add outname, newval
        End If
    End If
End Sub

So the rules can contain any expressions that can be handled by Evaluate. However, there seem to be a few strangenesses here. For example, on a worksheet, the formula:

=TODAY() – DATEVALUE(“01/09/08”)

evaluates to 337, treating the date as UK-style. However, in the VBA, the local date format is not applied: the date is treated as US style, giving 573. So entity #3 erroneously gets the service bonus of £50. Odd.

This example has gone only a little way beyond formulas, because it deals with one object at a time. What I’d like to try next is to have configuration instructions that construct or modify a multi-object model of some kind, which is then manipulated and queried. I need to find a good example – any suggestions?

Advertisements

9 Responses to “Domain Specific Languages – Part 2”


  1. 1 John Tolle August 4, 2009 at 3:10 pm

    Charles Williams at Decision Models has a very good page that covers some of the limitations of Excel’s Evaluate() method, including the date format thing:

    http://www.decisionmodels.com/calcsecretsh.htm

    There are more oddities than those, and pretty much none are oficially documented. And a general Evaluate() is an odd thing for Excel to expose at all. I assume Microsoft just wanted to give people the ability to evaluate literal range references and named formulas within VBA code, and the easiest way was to just open up a portal directly to their expression evaluator.

  2. 2 Roy MacLean August 4, 2009 at 3:35 pm

    Thanks, John, that’s a useful link.

    Regarding:
    “If the string formula contains a reference to a UDF it seems to be evaluated twice.”
    I came across this recently (see my last couple of comments to ‘Storing Arrays as Named Constants’).
    Serves me right for mucking around with side-effecting functions…

    So it looks like Evaluate is a bit dodgy. We also have Run, for VBA procedures, but this requires discrete arguments (so would need either to be parsed out of a string, or obtained from multiple cells).

  3. 3 John Tolle August 5, 2009 at 2:30 pm

    I came across all that as well, since the “formula-as-function” stuff I mentioned earlier also uses (abuses?) Evaluate().

    In that system, we define a function body as just a normal Excel formula with some values replaced by bracketed parameter names, like this: PI() * [r]^2. The complete lambda function looks like [~ r | PI() * [r]^2 ~]. (The bracket, tilde, and pipe are just to ease “parsing” in VBA without writing a real parser.)

    So we can call a function directly:

    =fcall(“[~ r | PI() * [r]^2 ~]”,10)

    or we can name it and call it by name:

    =fname(“circ-area”,”[~ r | PI() * [r]^2 ~]”)
    =fcall(“circ-area”,$A$42)

    or we can bind it into a cell and call it by cell ref:

    $Q$99=fbind(“[~ r | PI() * [r]^2 ~]”)
    =fcall($Q$99,$A$42)

    That last example is an indication of something else I mentioned, which is attempting to make functions work like other Excel values, so they can be stored in cells, passed as arguments (map() is a UDF which applies a function to each element in an array and returns the array of results):

    =SUM(map(“[~ x | [x]^2 ~]”,{1,2,3}))

    or returned as values. That lets us do something like:

    =fname(“make-lookup”,”[~ keys,values | [~ key | LOOKUP([key],[keys],[values] ~] ~]”)

    which returns a function that does a lookup. If we bind the result of calling it with some keys and values:

    $Q$99=fbind(fcall(“make-lookup”,$E$1:$E$10,$F$1:$F$10))

    we can call that with just the desired key:

    =fcall($Q$99,42)

    and remain ignorant or the actual lookup mechanism.

    This all started because we had some nasty formulas that were repeated multiple places in what was becoming an ever-more-complex model that had to stay in Excel. I’ve continued to work on it partly for fun and partly because we often work with domain experts who know Excel but aren’t conventional programmers. The models we build are early-stage, constantly moving targets, where spreadsheets in general and Excel in particular have lots of advantages, except for the part about not supporting good software development practice with functions, etc. A complex (spreadsheet or otherwise) model really needs to be treated as “real” software, though.

    From the links here, it looks like there are several other people who are interested in the same basic thing, so I’m definitely going to release it. (It includes a little Excel/VBA unit test framework and some VBA debugging tools, too.) Where would be the best place to put an open source Excel/VBA project? It seems like there are lots of (very useful and appreciated) “guru” sites but not really a strong tradition of open source projects in this community.

  4. 4 John Tolle August 7, 2009 at 1:23 am

    Earlier I also mentioned a utility we built on top of the above system to do inherently procedural logic but keep it in Excel-centric terms

    This is a simple example. There are some state variables (pump, lvl), some actions (pump on, fill, etc.), a grid of transition function bodies that tell how to update the state variables on each action, and the last column that contains function bodies that tell what action should come next. (Blanks default to “leave that state variable the same” and “go to next action”.)

    |pump | lvl |
    pump on |TRUE | |
    fill | | [lvl]+1 | IF([lvl]>=3,”pump off”,”fill”)
    pump off |FALSE | |
    drain | | [lvl]-1 | IF([lvl]<=0,"pump on","drain")

    If the initial state is {TRUE,1}, and the inital action is "fill", the calcSM() UDF returns:

    TRUE 2 fill
    TRUE 3 fill
    TRUE 4 pump off
    FALSE 4 drain
    FALSE 3 drain
    FALSE 2 drain
    FALSE 1 drain
    FALSE 0 drain
    FALSE -1 pump on
    TRUE -1 fill
    TRUE 0 fill
    TRUE 1 fill
    TRUE 2 fill
    TRUE 3 fill
    TRUE 4 pump off

    It works by taking the function bodies and building functions that look like [~ pump,lvl | TRUE ~] and [~ pump,lvl | [lvl] + 1 ~], etc., and then at each action calling them with the state variables as parameters.

    We used this to embed a very lightweight discrete event simulation in our spreadsheet. It seems ridiculous on one level, because it's a terrible simulator judged as a simulator. But sometimes "worse is better". In this case, the engineers we're working on the model with can understand, manipulate, and tweak it with almost no training, no extra software, and tight integration with the rest of the model which supplies the inputs and uses the results.

    (I tried posting this earlier, but it never showed up. Aplogies if it is dupliacted.)

  5. 5 Roy MacLean August 7, 2009 at 10:33 am

    Sorry, John, your comment had been dumped in the Spam folder – must be all those vertical bars 🙂 I’ll read it now…

  6. 6 John Tolle August 7, 2009 at 1:58 pm

    P.S. The vertical bars were just to separate out what would appear in different cells in the sheet, not part of the definitions.

  7. 7 Roy MacLean August 7, 2009 at 4:31 pm

    John,
    I take your point about formulas being first-class citizens – like blocks/lambdas in languages such as Ruby.

    Your pump state-transition model is definitely the kind of thing I have in mind, where the excel worksheets are used for (a) configuration, (b) instructions and (c) output. Being an OO kind of guy, I’m thinking of an object model behind the scenes. I’ve made good progress today (it is Friday!) with a PERT-type planning tool, which builds and interrogates a network of Activity objects. I’ll describe this in a proper posting, soonish.

    /R

  8. 8 John Tolle August 8, 2009 at 12:25 am

    I’ll be looking forward to it. Our little engine is meant to run as a UDF, so it’s triggered by changes in it’s inputs, and it’s output can trigger other stuff, etc. “Programming” in it is a little like programming in a very high-level assembly language, with all of Excel’s richness at computing values, but no control structures other than conditional jumps. But as silly as it sounds, “Excel with GOTOs” is actually kind of useful.


  1. 1 Domain Specific Languages – Part 3 « Roy MacLean’s VBA Blog Trackback on August 10, 2009 at 9:49 am

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s




August 2009
M T W T F S S
« Jul   Sep »
 12
3456789
10111213141516
17181920212223
24252627282930
31  

%d bloggers like this: