Adding HTML Help – 2

I was previously wondering how to link a VBA project to HTML Help (that is, linking to a page in a .chm file). I’m thinking of the Help as pages in a User Manual, not control-level Help (“The Delete button deletes the current item”, etc).

Having got some compiled Help (a CHM file), we need to link into it. The simplest way seems to be to use the Application.Help method. This takes a path to a CHM file, and a numerical key – for example:

Application.Help “C:\…….\myhelp.chm”, 1002

There are a couple of issues with this. The first is that each call to this method (on the same file) opens a new Help Viewer window. I haven’t found a way round this. The second issue is that the numerical keys have to be defined in a header file prior to compilation – something like this:

#define VBA_Creating_a_Widget     1001
#define VBA_Filtering_Widgets        1002

where the names are those of the topic files in the HTML Help project (minus the .html suffix). This mapping is compiled into the help project to produce the CHM file. However, generation from DITA (XML) source goes straight to the compiled CHM, so there’s no chance of adding in a map. (You could decompile the CHM and then add the map before recompiling).

Fortunately, it doesn’t matter, because there’s another way of linking to the Help from VBA. This is to use a more general function, HtmlHelp, available from the Help OCX (HHCtrl.ocx).

You need to declare this function in a public module:

Declare Function HtmlHelp Lib "HHCtrl.ocx" _
   Alias "HtmlHelpA" _
   (ByVal hwndCaller As Long, _
   ByVal pszFile As String, _
   ByVal uCommand As Long, _
   dwData As Any) As Long

You also need a couple of constants (just to give us meaningful names):

Const HH_DISPLAY_TOPIC = &H0
Const HH_HELP_CONTEXT = &HF

The constant declarations  need to be in each module that calls the function.

We can now call the function from a Command button handler, or whatever:

Private Sub HelpCommand_Click()
  Call HtmlHelp (0, Dotslash("myhelp.chm"), _
      HH_DISPLAY_TOPIC, ByVal "VBA_Filtering_Widgets.html")
End Sub

The zero argument is to do with the windowing – the Help opens in a sibling window. Note the ByVal passing of the string (because it’s to an external library). Dotslash is just my shorthand for an absolute path (which is recommended):

Public Function Dotslash(filename As String) As String
  Dotslash = ActiveWorkbook.Path + "\" + filename
End Function

The other mode argument, HH_HELP_CONTEXT, is for the mapped-id lookup, which is what Application.Help does.

This is just scratching the surface. There’s more information on MSDN.

Adding HTML Help

We’re all familiar with HTML Help for applications – usually in the form of .chm files (with the yellow question-mark icon). Typically, constructing these this involves getting to grips with the Microsoft HTML Help Workshop. This includes a tool for building Help projects, and compiling them into .chm files.

This is okay for fine-grained context-sensitive Help, at the level of individual operations and controls. However, if the material is more in the nature of a User Manual, this is probably not the form in which we want to author it. And we will probably want to generate different forms of output, such as general HTML pages and PDF.

A few posts ago, I mentioned producing documentation using the XML-based DITA framework. One of the benefits of this is that content is held in a generic XML-based format, from which outputs of different types can be generated: XHTML, PDF and CHM (and also Java Help). DITA allows us to select and combine various topics (chunks of content) from our User Manual – perhaps the lower-level task or reference topics – and compile these into a .chm file. Apart from the compiler, which is invoked automatically, this does not involve the HTML Help Workshop.

Note on tools: There is a DITA Open Toolkit, but I have been using tools from XMLMind – the free XML Editor (Personal Edition) and the open-source DITA Converter. And very good they are too.

The selection, structuring and ordering of topics is done using a DITA Map file. HTML Help is specified as the output format via options to the DITA Converter.

Having got our .chm file, the next step is to access it from a VBA project. I’d been vaguely aware that you could do this, but had never investigated further. More tomorrow…

Managing Email Attachments – Design

In the previous posting, I was thinking about how to go through email messages, saving attchments and then removing them, and possibly doing other things to the messages, like moving them. While you can do a certain amount using Outlook rules, and yes there are third-party utilities out there, it’s interesting to think about how to do this in VBA.

As is often the way, you start with some specific requirements, think about it a bit, and come up with a rather more generic mechanism – something with a bit of Design Pattern about it. It seemed to me that I might want to apply multiple rules to each message, depending on properties of the message or attachment.

So I would give a message to a chain of rules, each of which would be applied to the message in turn. When a rule is applied, it evaluates some test condition, and if true performs some actions on the message (like saving and removing its attachment). So here’s the outline design:

Rule design

The Dispatcher object (these are Class modules) creates and holds on to a RuleChain. When Run is called on the Dispatcher, it goes through the current folder (and potentially its subfolders), getting MailItems, and giving them to the RuleChain:

Sub Run()
    Dim folder As MAPIFolder
    Set folder = Application.ActiveExplorer.CurrentFolder

    Dim item As Object
    For Each item In folder.Items
        Select Case item.Class
            Case olMail
                di_rulechain.Apply item
            Case olFolder
                'recurse ...
        End Select
    Next
End Sub

The RuleChain just iterates over its individual Rules:

Sub Apply(mi As MailItem)
    Dim i As Integer
    For i = 1 To UBound(rc_rules)
        rc_rules(i).Apply mi
    Next
End Sub

A Rule applies itself to a MailItem:

Sub Apply(mi As MailItem)
    If RuleTest(mi) Then RuleAction mi
End Sub

Simples! Now, we want different rules, but they have to be objects of the same class. So the class has to cover all the tests and actions that we might want, with switches/settings to make them specific. For example:

Dim ru_saveattachment As Boolean       'save attachment?
Dim ru_saveattachmentfolder As String  'where?
Dim ru_removeattachment As Boolean     'remove attachment?

So where do the settings for a particular Rule come from? It seems like a good idea if a Rule is represented by something that is (a) persistent between sessions, and (b) editable by a user, without going into the VBA code. One candidate is to use an Outlook Note. We can have a very simple syntax, where a test starts with ‘?’, and an action starts with ‘!’. For example:

RuleNote

This says that if a message is in (has Parent) ‘Inbox’, then save the attachment to /Temp and remove it. Multiple tests are ANDed (multiple Rules provide disjunction). Since we might have ‘ordinary’ Notes, and also so we can disable individual Rules, only pink notes are treated as Rules.

The RuleChain is responsible for going through the Notes folder, creating the Rules:

Private Sub Class_Initialize()    'for RuleChain
    Dim notesfolder As MAPIFolder
    Set notesfolder = ...
    ReDim rc_rules(1 To notesfolder.Items.Count)

    Dim note As NoteItem
    Dim ru As Rule
    Dim i As Integer
    i = 1
    For Each note In notesfolder.Items
        If note.Color = olPink Then  'it is a rule
            Set ru = New Rule
            ru.Load note
            Set rc_rules(i) = ru
            i = i + 1
        End If
    Next

    ReDim Preserve rc_rules(1 To (i - 1))
End Sub

The Rule.Load method is responsible for parsing NoteItem.Body, and setting up the Rule accordingly. Clearly you could get quite sophisticated here.

At present the only actions are saving and removing attachments, and I haven’t thought about adding links/ids, to allow traceability. More (maybe) anon.

Managing Email Attachments

Following on from the previous posting, I’ve been wondering what else to do with Outlook.

One thing that’s I’ve meant to sort out for a long time is managing email attachments (Word documents, PDFs, etc). My outlook.pst file is not far short of half a gigabyte, and presumably this is largely due to lots and lots of  message attachments. Nearly all of these will have been Saved As, but almost never Removed. It’s annoying that there isn’t a Detach operation that does both.

So what I’d like is an Attachments Manager. Yes, there are some out there, but it’s more fun to roll your own, and learn something in the process.

One issue to decide is whether we want the saving/removal to be done automatically, on receipt of the message, or manually initiated as a housekeeping task (or both). I think I’ll start with the latter, as it gives an opportunity to check an incoming message.

We’ll want to scope the messages being processed, for example by:

  • Folder (e.g. Inbox, Saved, all folders)
  • Date (e.g. older than)
  • File types for the attachments (e.g. only MS Office, PDFs)
  • Message title (or content)
  • Message sender.

We’ll also need to specify target folders for the saved files.

It would be good to maintain a link from the message to the saved file. I’m thinking this can just be a file:/// URL. We get a sort of back-link just by searching for the filename (I use the Lookout search facility).

Given that we want a degree of flexibility in setting up the processing parameters, a front-end user-form would be good. I can see this turning into a bit of a mini-project…

Email Auto-Reply – Solution

In the previous posting, I was wondering how to generate an email auto-reply that contains your next free day, based on Calendar appointments. I had been thinking of using Rules, but a comment by Jon Peltier pointed me in the right direction.

The trigger is to handle the event Application_NewMail (the handler goes in the ThisOutlookSession module):

Private Sub Application_NewMail()
    Dim ib As MAPIFolder
    Dim mi As MailItem

    Set ib = _
       Application.Session.GetDefaultFolder(olFolderInbox)
    Set mi = ib.Items.GetLast

    RespondTo mi
End Sub

The incoming message needs to be inspected for information about the sender, and responded to, if appropriate:

Sub RespondTo(mi_in As MailItem)

    If IsAutoRespond(mi_in) Then
        MakeResponse mi_in
    End If

End Sub

An incoming message has a SenderName property. Oddly, it has a To property, but not a From property (why? am I missing something?). The SenderName is the ‘display’ name, that you typically see in the ‘From’ column of your Inbox, in preference to the actual email address. As far as I can see, this comes from the alias set up for the email account of the sender; it doesn’t correspond to any name field in the corresponding Contact (Full Name, E-mail, etc).

So we need to record the SenderName, as it appears in your Inbox, in a field of the corresponding Contact. We could add a user-defined field, but on the Details tab there’s a Nickname field, which I doubt is ever used, and so can be hi-jacked. Use of this field could be taken as enabling our auto-reply, but to be on the safe side, let’s add a user-defined field AutoRespond, set to True.

The IsAutoRespond function uses the Find function on the Contacts:

Set contact = contactsfolder.Items.Find(filter)

where

filter = _
     "[Nickname] = " + Chr(34) + mi.SenderName + Chr(34) + _
     " And [AutoRespond] = " + Chr(34) + "True" + Chr(34)

When a filter string is given as a literal, the double-quotes are doubled (e.g. “”True”"). I discovered that this isn’t necessary when using chr(34). If the Find fails, then the result variable is Nothing.

MakeResponse replies to the incoming message, using the next free date in the reply message’s body:

Sub MakeResponse(mi_in As MailItem)

    Dim mi_out As MailItem
    Dim nextfree As Date

    Set mi_out = mi_in.Reply

    With mi_out
        .Body = responsetext & NextFreeDay
    End With
    mi_out.Send
End Sub

The NextFreeDay function goes through the Calendar looking for All-Day-Event appointments (these could be fake ones, used just to block out days; I have mine as Free time, so that they don’t clash with ‘real’ appointments). The result starts as tomorrow’s date, and gets pushed into the future, as appointments are found. Weekends and other unavailable dates should be taken into account.

The initial version is pretty basic. It would seem possible to use Calendar information in more sophisticated ways, or indeed to analyse the incoming message content.

Email Auto-Reply with Availability

My wife’s a Business Intelligence (sic) consultant, writing reports on hideously complicated databases. As is the way of things, she does reports for different customers and projects. The difficulty lies in getting across the idea that when working for customer X, she’s not available to fire-fight problems for customer Y. So she responds to emails, saying “I’m sorry, my next availability to look at your problem is on <date>”, where <date> is found by looking in her diary for the next free day.

Assuming that you’re using Outlook as both email client and calendar, is it possible to generate such replies automatically? Presumably incoming emails (with certain properties, say Senders) can be detected using Rules, and a reply sent. But can we intercept the rule-processing, and insert information derived from the Calendar (or indeed elsewhere) into the reply message?

Having not done a lot with Outlook (compared with Excel and Access), I have no idea at this stage, but I’ll investigate, and report back. Obviously, if anyone out there has done this already, let me know!

Digression on Documentation: DITA

As is the way of things, I’ve been diverted from the delights of VBA, on to some other work. This is a migration of the system definition documentation for a major financial system, from traditional Word documents to an XML-based architecture called DITA. (Somewhat fancifully, the ‘D’ stands for Darwin, which is appropriate in this, his bicentenary year).

The production of technical documentation is undergoing something of a revolution at present. This is due to the maturing of a raft of technologies based on the XML markup language. Broadly speaking, these technologies provide a solution that sits between monolithic documents-as-files (such as Word documents), and relational databases, as complex aggregations of fine-grained information records.

With a Word document, the unit of content is the same as the unit of presentation: a file is edited, and the same file is printed. This applies equally to web pages (HTML), but with the page as the unit. In practice, it is difficult and time-consuming to identify, extract and recombine fragments of documents to produce new deliverables.

With a relational database, the information records can be conjoined, aggregated and filtered in very complex ways, using a query language (SQL). However, databases are not really suited to holding large free-text elements, like a section of a document. Also, there is no notion of hierarchical structuring in query output, in contrast to the hierarchy of chapters, sections and sub-sections that we are familiar with in documents.

The XML-based solutions aim to provide a middle course. Content is created and held in a form that is structured enough to identify, extract and recombine fragments of documents to produce new deliverables. At the same time, the content does not carry information about presentation (either the target format or the details of layout). This is provided by transformations of the content to produce deliverables in different formats, such as Word or PDF for printable documents, or hypertext (XHTML) for web presentation or online Help.

The challenge is to come up with an information model that defines and relates appropriate topics (i.e. basic chunks), in ways that allow querying, selection and combination in flexible ways. There’s a trade-off here between flexibility and chunk size. Too fine-grained and it’s impossible to manage; too coarse-grained and you’re back with monolithic documents. There’s a wider trend towards ‘medium-sized’ information chunks: think of blog posts, like this one, or Wiki pages.

More on this anon…

Domain Specific Languages – Part 3

In the previous posting, I was trying out a simple rule-based DSL: in effect, taking control of expression evaluation away from the formula mechanism. Perhaps more interesting is to have a proper object model behind the scenes, where the objects ‘collaborate’ to calculate properties of interest, in ways that would be difficult to do with formulas alone.

In this example, the object model is a network of activities, in the manner of PERT-style planning. An activity has a duration, and some number of predecessor activities, on which it depends (for inputs or  resources):

network

There is a single initial pseudo-activity, which is given an actual finish date, as a baseline for the analysis. A1 and A3 can then start on the following day.

An activity starts at start-of-day on its start date, and finishes at end-of-day on its finish date. So a 2-day activity might have start date 12/08/09 and finish date 13/08/09.

As time passes, activities get initiated and completed, and thus get actual start and finish dates.

The aim of the analysis is to work out the the earliest start and end dates for activities, back to the point where activities have actual dates. A subsequent version also calculates latest start/finish dates, and thus ’slack’, but that’s omitted here for simplicity.

All the work is done by a class (module) Activity. This has properties for duration and the various dates, and Collections for predecessor and successor Activities (only the predecessors are used in this version). If an Activity does not have a stored value for EarliestStart, it calculates it as the latest of the EarliestFinishes of its predecessors. The recursion is terminated by an ActualFinish. Here’s the procedure:

Private Function CalcEarliestStart() As Date
    If obj_actual_start > 0 Then
        CalcEarliestStart = obj_actual_start
    Else
        Dim pred As Activity
        For Each pred In obj_preds
            If pred.EarliestFinish > CalcEarliestStart Then
                CalcEarliestStart = pred.EarliestFinish
            End If
        Next
        CalcEarliestStart = CalcEarliestStart + 1
                             'start of next day
    End If
End Function

The activities (with durations and possibly actual dates) are set up on a worksheet. The outputs are generated to this worksheet, but could be elsewhere:

Pert_main

The Predecessor relationships are set up on a separate worksheet:

Pert_network

A general module Objects contains a Collection for the Activity objects, plus the procedures to run the analysis:

Public activities As Collection

Sub RunPert()
 Range("outputs").Clear
 LoadActivities
 ConfigureNetwork
 CalcPert
End Sub

LoadActivities creates the objects and puts them in the Collection; ConfigureNetwork sets the predecessor relationships; CalcPert queries the Earliest dates for each Activity.

Since each Activity caches its Earliest dates, once calculated, it does not matter in what order the activities appear on the worksheet. It’s easy to add or change activities, and enter Actual dates as they happen (the second version is a bit more sophisticated about this – e.g. updating the duration).

Obviously this isn’t an all-singing, all-dancing planning tool, but I think there’s enough here to suggest that the approach could be quite useful for this type of problem.

Domain Specific Languages – Part 2

In the previous posting, I started musing on using Excel to host ‘domain-specific’ languages (DSLs) – that is, ways of tackling particular problem areas or types of problem. For starters, I’m looking at a ‘payroll’ type of problem, where we want to take each member of a set of entities (e.g. employees), and apply a sequence of condition-modification rules to them. There are potentially many rules for various special cases and exceptions. Using formulas alone would embed the rules in the conditional structure (IF() functions), and would be harder to read and modify.

One approach is to support the iteration and rule-application aspects in a simple (VBA) interpreter. The interpreter is generic with respect to the actual data involved. The rules themselves are expressed on a worksheet as Evaluate-able expressions.

Here are the entities (typically, a large number thereof):

Rules_entities

‘grade’ and ’startdate’ are two (of possibly many) attributes.

Here are the rules:

Rules_rules

Column B is for the Boolean condition expressions.Note that some can be TRUE.

Columns C and D are for entity properties that are updated by some of the rules. These are effectively assignment statements. So C3 is equivalent to ‘payment = base’; C5 to ‘payment = payment + servicebonus’. The expressions can refer to entity properties (e.g. grade) and named parameters (e.g. grdincrement). The first rule adds an additional, calculated property daysemployed.

The outputs are selected properties of an entity, with the final values from application of the rule sequence:

Rules_outputs

Now the VBA side. An Entity is a class module that simply defines a Dictionary to hold the properties (i.e. as key-value pairs). Dictionary is from the Scripting library; it has a few advantages over Collection, such as an Exists(key) method. As it happens, there is only one entity at a time, so the class module isn’t strictly necessary, but it just feels better to have entities as objects.

The Interpreter has a number of iteration levels:

  • initialize and process each entity
  • for each entity, process each rule
  • for each entity-rule:
  • evaluate the condition
  • if true, update each output property
  • and at the end, put the final outputs on the Outputs worksheet.

The entity property names (i.e. the keys into the Dictionary) come from the header rows, which are retrieved at the start.

The condition and update expressions are evaluated using the Application.Evaluate function. Here’s the condition evaluation for a rule:

Sub ProcessEntityRule(ent As Entity, r As Integer)
'r is the rule number

    rulefields = _
    Range("Rules!A1").Offset(r, 0).Resize(1, _
                                    rulefieldcount).Value

    Dim condexpr As String  'condition expression
    Dim condexpr2 As String 'condition with substitutions
    Dim cond As Boolean     'condition result

    condexpr = rulefields(1, 2)
    condexpr2 = Substitute(ent, condexpr)
    cond = Evaluate(condexpr2)

    If cond Then
        Dim op As Integer
        Dim outname As String      'e.g. "payment"
        Dim outexpr As String        'e.g. "base"
        For op = 3 To rulefieldcount
            outname = ruleheaders(1, op)
            outexpr = rulefields(1, op)
            UpdateEntity ent, outname, outexpr
        Next
    End If
End Sub

Substitute replaces any property names in condexpr with the corresponding property values. This means that the names have to be distinct from each other, and from parameter names. So, for example, ‘grade’ and ‘gradeincrement’ would clash (hence ‘grdincrement’).

Here’s the entity update:

Sub UpdateEntity(ent As Entity, outname As String, _
                                 expr As String)

    Dim expr2 As String 'with substitutions
    Dim newval As Variant

    If Len(expr) > 0 Then
        expr2 = Substitute(ent, expr)
        newval = Evaluate(expr2)

        If ent.Properties.Exists(outname) Then
            ent.Properties.item(outname) = newval
        Else
            ent.Properties.Add outname, newval
        End If
    End If
End Sub

So the rules can contain any expressions that can be handled by Evaluate. However, there seem to be a few strangenesses here. For example, on a worksheet, the formula:

=TODAY() – DATEVALUE(“01/09/08″)

evaluates to 337, treating the date as UK-style. However, in the VBA, the local date format is not applied: the date is treated as US style, giving 573. So entity #3 erroneously gets the service bonus of £50. Odd.

This example has gone only a little way beyond formulas, because it deals with one object at a time. What I’d like to try next is to have configuration instructions that construct or modify a multi-object model of some kind, which is then manipulated and queried. I need to find a good example – any suggestions?

Domain Specific Languages – Part 1

I’ve long been a bit of a fan of the Ruby language (yes, there is more to life than VBA). I’m currently reading Metaprogramming Ruby, from the excellent Pragmatic Programmers. This is all about taking a very dynamic, data-driven view of programs, customising or extending class/object capabilities at run-time. The language design owes a lot to Smalltalk, which followed the everything-is-an-object philosophy (so classes are objects, and therefore need their own classes: metaclasses).

Anyway, in the Ruby community there’s interest in Domain Specific Languages: problem-specific ‘dialects’ of Ruby that offer higher level expressions for particular domains. Which led me to wonder what a DSL would be like in Excel/VBA.

Now, Excel has a sort of DSL in its formula language. It’s specific to the domain of functional calculation and lookup, with built-in functions for various categories. Some categories, like the financial functions, are focused on a particular domain; other categories are more general.

With VBA, we can clearly add custom functions relevant to a particular domain: financial, engineering, whatever. However, a formula typically produces a single value and puts it in a single cell, so the effect is quite small scale. While we can have functions that return arrays, these can be unwieldy, and can be very dependent on worksheet layout. Also, there are manageability limits to the complexity of formulas. So it could be difficult to use formulas alone to build and manipulate complex structures or models.

For example, there are numerous problems that require a node-link graph structure: workflow, scheduling, resource tracking, etc. Although such a structure could be represented as data on a worksheet, it would be quite cumbersome to manipulate it in this form.

An alternative approach would be treat the worksheet data as instructions, which are interpreted to build an internal (VBA) object model. Other types of instruction would be commands to manipulate the model, and queries of it in various ways. This implies that we have some sort of ‘worksheet interpreter’ (in which we might guess that the Evaluate and Run procedures will come in handy).

The example I’ve been looking at so far is what you might call ‘payroll type’ processing. There is a set of entities (employees, say), each of which needs to be processed, in order to determine a final value (their monthly salary payment). There is a set (sequence, really) of rules, which need to be applied to each entity, possibly modifying the value. For example: if the employee grade > 3 and they’ve worked more than 1 year and it’s December, pay them a bonus of £100. Handling all the exceptions and modifications is a classic problem for payroll systems.

Now, it’s not impossible to do this with formulas alone, but you’d be knee-deep in IF() functions. The intermediate values would be visible (presumably in columns, with the employees in rows), leading to a humungous worksheet. Another issue is that this type of processing needs to be done at a specific date/time (so we can work out things like length-of-service). We don’t want the calculations to be dynamic; we want a nice big ‘Run’ button.

I’ll describe the solution so far in the next posting.

Next Page »