Archive for August, 2009

Digression on Documentation: DITA

As is the way of things, I’ve been diverted from the delights of VBA, on to some other work. This is a migration of the system definition documentation for a major financial system, from traditional Word documents to an XML-based architecture called DITA. (Somewhat fancifully, the ‘D’ stands for Darwin, which is appropriate in this, his bicentenary year).

The production of technical documentation is undergoing something of a revolution at present. This is due to the maturing of a raft of technologies based on the XML markup language. Broadly speaking, these technologies provide a solution that sits between monolithic documents-as-files (such as Word documents), and relational databases, as complex aggregations of fine-grained information records.

With a Word document, the unit of content is the same as the unit of presentation: a file is edited, and the same file is printed. This applies equally to web pages (HTML), but with the page as the unit. In practice, it is difficult and time-consuming to identify, extract and recombine fragments of documents to produce new deliverables.

With a relational database, the information records can be conjoined, aggregated and filtered in very complex ways, using a query language (SQL). However, databases are not really suited to holding large free-text elements, like a section of a document. Also, there is no notion of hierarchical structuring in query output, in contrast to the hierarchy of chapters, sections and sub-sections that we are familiar with in documents.

The XML-based solutions aim to provide a middle course. Content is created and held in a form that is structured enough to identify, extract and recombine fragments of documents to produce new deliverables. At the same time, the content does not carry information about presentation (either the target format or the details of layout). This is provided by transformations of the content to produce deliverables in different formats, such as Word or PDF for printable documents, or hypertext (XHTML) for web presentation or online Help.

The challenge is to come up with an information model that defines and relates appropriate topics (i.e. basic chunks), in ways that allow querying, selection and combination in flexible ways. There’s a trade-off here between flexibility and chunk size. Too fine-grained and it’s impossible to manage; too coarse-grained and you’re back with monolithic documents. There’s a wider trend towards ‘medium-sized’ information chunks: think of blog posts, like this one, or Wiki pages.

More on this anon…


Domain Specific Languages – Part 3

In the previous posting, I was trying out a simple rule-based DSL: in effect, taking control of expression evaluation away from the formula mechanism. Perhaps more interesting is to have a proper object model behind the scenes, where the objects ‘collaborate’ to calculate properties of interest, in ways that would be difficult to do with formulas alone.

In this example, the object model is a network of activities, in the manner of PERT-style planning. An activity has a duration, and some number of predecessor activities, on which it depends (for inputs or  resources):


There is a single initial pseudo-activity, which is given an actual finish date, as a baseline for the analysis. A1 and A3 can then start on the following day.

An activity starts at start-of-day on its start date, and finishes at end-of-day on its finish date. So a 2-day activity might have start date 12/08/09 and finish date 13/08/09.

As time passes, activities get initiated and completed, and thus get actual start and finish dates.

The aim of the analysis is to work out the the earliest start and end dates for activities, back to the point where activities have actual dates. A subsequent version also calculates latest start/finish dates, and thus ‘slack’, but that’s omitted here for simplicity.

All the work is done by a class (module) Activity. This has properties for duration and the various dates, and Collections for predecessor and successor Activities (only the predecessors are used in this version). If an Activity does not have a stored value for EarliestStart, it calculates it as the latest of the EarliestFinishes of its predecessors. The recursion is terminated by an ActualFinish. Here’s the procedure:

Private Function CalcEarliestStart() As Date
    If obj_actual_start > 0 Then
        CalcEarliestStart = obj_actual_start
        Dim pred As Activity
        For Each pred In obj_preds
            If pred.EarliestFinish > CalcEarliestStart Then
                CalcEarliestStart = pred.EarliestFinish
            End If
        CalcEarliestStart = CalcEarliestStart + 1
                             'start of next day
    End If
End Function

The activities (with durations and possibly actual dates) are set up on a worksheet. The outputs are generated to this worksheet, but could be elsewhere:


The Predecessor relationships are set up on a separate worksheet:


A general module Objects contains a Collection for the Activity objects, plus the procedures to run the analysis:

Public activities As Collection

Sub RunPert()
End Sub

LoadActivities creates the objects and puts them in the Collection; ConfigureNetwork sets the predecessor relationships; CalcPert queries the Earliest dates for each Activity.

Since each Activity caches its Earliest dates, once calculated, it does not matter in what order the activities appear on the worksheet. It’s easy to add or change activities, and enter Actual dates as they happen (the second version is a bit more sophisticated about this – e.g. updating the duration).

Obviously this isn’t an all-singing, all-dancing planning tool, but I think there’s enough here to suggest that the approach could be quite useful for this type of problem.

Domain Specific Languages – Part 2

In the previous posting, I started musing on using Excel to host ‘domain-specific’ languages (DSLs) – that is, ways of tackling particular problem areas or types of problem. For starters, I’m looking at a ‘payroll’ type of problem, where we want to take each member of a set of entities (e.g. employees), and apply a sequence of condition-modification rules to them. There are potentially many rules for various special cases and exceptions. Using formulas alone would embed the rules in the conditional structure (IF() functions), and would be harder to read and modify.

One approach is to support the iteration and rule-application aspects in a simple (VBA) interpreter. The interpreter is generic with respect to the actual data involved. The rules themselves are expressed on a worksheet as Evaluate-able expressions.

Here are the entities (typically, a large number thereof):


‘grade’ and ‘startdate’ are two (of possibly many) attributes.

Here are the rules:


Column B is for the Boolean condition expressions.Note that some can be TRUE.

Columns C and D are for entity properties that are updated by some of the rules. These are effectively assignment statements. So C3 is equivalent to ‘payment = base’; C5 to ‘payment = payment + servicebonus’. The expressions can refer to entity properties (e.g. grade) and named parameters (e.g. grdincrement). The first rule adds an additional, calculated property daysemployed.

The outputs are selected properties of an entity, with the final values from application of the rule sequence:


Now the VBA side. An Entity is a class module that simply defines a Dictionary to hold the properties (i.e. as key-value pairs). Dictionary is from the Scripting library; it has a few advantages over Collection, such as an Exists(key) method. As it happens, there is only one entity at a time, so the class module isn’t strictly necessary, but it just feels better to have entities as objects.

The Interpreter has a number of iteration levels:

  • initialize and process each entity
  • for each entity, process each rule
  • for each entity-rule:
  • evaluate the condition
  • if true, update each output property
  • and at the end, put the final outputs on the Outputs worksheet.

The entity property names (i.e. the keys into the Dictionary) come from the header rows, which are retrieved at the start.

The condition and update expressions are evaluated using the Application.Evaluate function. Here’s the condition evaluation for a rule:

Sub ProcessEntityRule(ent As Entity, r As Integer)
'r is the rule number

    rulefields = _
    Range("Rules!A1").Offset(r, 0).Resize(1, _

    Dim condexpr As String  'condition expression
    Dim condexpr2 As String 'condition with substitutions
    Dim cond As Boolean     'condition result

    condexpr = rulefields(1, 2)
    condexpr2 = Substitute(ent, condexpr)
    cond = Evaluate(condexpr2)

    If cond Then
        Dim op As Integer
        Dim outname As String      'e.g. "payment"
        Dim outexpr As String        'e.g. "base"
        For op = 3 To rulefieldcount
            outname = ruleheaders(1, op)
            outexpr = rulefields(1, op)
            UpdateEntity ent, outname, outexpr
    End If
End Sub

Substitute replaces any property names in condexpr with the corresponding property values. This means that the names have to be distinct from each other, and from parameter names. So, for example, ‘grade’ and ‘gradeincrement’ would clash (hence ‘grdincrement’).

Here’s the entity update:

Sub UpdateEntity(ent As Entity, outname As String, _
                                 expr As String)

    Dim expr2 As String 'with substitutions
    Dim newval As Variant

    If Len(expr) > 0 Then
        expr2 = Substitute(ent, expr)
        newval = Evaluate(expr2)

        If ent.Properties.Exists(outname) Then
            ent.Properties.item(outname) = newval
            ent.Properties.Add outname, newval
        End If
    End If
End Sub

So the rules can contain any expressions that can be handled by Evaluate. However, there seem to be a few strangenesses here. For example, on a worksheet, the formula:

=TODAY() – DATEVALUE(“01/09/08”)

evaluates to 337, treating the date as UK-style. However, in the VBA, the local date format is not applied: the date is treated as US style, giving 573. So entity #3 erroneously gets the service bonus of £50. Odd.

This example has gone only a little way beyond formulas, because it deals with one object at a time. What I’d like to try next is to have configuration instructions that construct or modify a multi-object model of some kind, which is then manipulated and queried. I need to find a good example – any suggestions?

August 2009
« Jul   Sep »