Posts Tagged 'VBA'

Moving Data Out of Outlook – part 2

Happy and code-full 2011!

In an earlier post, I was considering exporting my Outlook contacts as XHTML pages, and accessing these as a repository of free-format, but queryable, documents. After some time out for Christmas and New Year, not to mention my family’s seasonal cough-fest, I can report back.

I haven’t attempted a comprehensive export of all contact attributes, just the standard ones that I typically use. This still leaves information in the Contact’s body section, but I’ll migrate that manually, as and when.

Here’s an example contact page:

I just grabbed a standard style sheet, so the formatting is not up to much (for example, the cell text middle-aligns vertically, as you can see with “Address”).

In terms of the VBA code, all we need is an adapter class, ContactPage. Instances of this are created for each ContactItem that we export. A ContactPage is responsible for building the XHTML string corresponding to the ContactItem, and writing it out to a text file. Here’s the root procedure:

Const outdir As String = "C:\Temp\Contacts"

Sub ExportContacts()
  Dim allcontacts As Items
  Set allcontacts = Session.Folders("Personal Folders"). _
                             Folders("Contacts").Items

  MsgBox "Exporting " & allcontacts.Count & " Contacts"

  Dim cn As ContactItem  'Outlook class
  Dim cp As ContactPage  'Custom class
  For Each cn In allcontacts
    Set cp = New ContactPage
    Set cp.Contact = cn    'Property Set
    cp.OutputToFile outdir
  Next
End Sub

ContactPage has a function for PageTitle, which is taken from the ContactItem‘s Company or Full Name (in that order of preference). Functions PageId and PageFile are based on this (with a bit of cleaning up to remove characters such as ‘/’).

The sub OutputToFile uses the Microsoft Scripting Runtime library, specifically FileSystemObject and TextStream. You need to load the Scripting library manually, via >Tools >References.

The rest of ContactPage’s features are just functions that generate elements in the XHTML hierarchy. For example:

Public Function XhtmlPage() As String
'The XHTML representation of cp_contact
    XhtmlPage = Header_ & vbCrLf & _
                Html_
End Function

'...

Public Function Html_() As String
    Html_ = "<html id=" & Chr(34) & PageId & Chr(34) '& ...
    Html_ = Html_ & vbCrLf & _
            Head_ & vbCrLf & _
            Body_ & vbCrLf & _
            "</html>"
End Function

Public Function Body_() As String
    Body_ = "<body>" & vbCrLf & _
            "<h1>" & PageTitle & "</h1>" & vbCrLf & _
            Company_table_ & vbCrLf & _
            Person_table_(1) & vbCrLf & _
            Person_table_(2) & vbCrLf & _
            Person_table_(3) & vbCrLf & _
            Categories_ul_ & vbCrLf & _
            "</body>"
End Function

'...

Public Function LastName_td_() As String
    LastName_td_ = "<td class=" & Chr(34) & "LastName" & Chr(34) & ">" & _
     cp_contact.LastName & "</td>"
End Function

The lowest level functions, typically for <td> elements, access the ContactItem‘s properties. As you can see, there are a lot of quotation characters (Chr(34)) in XHTML. The trailing-underscore convention for the function names was so that I could call the functions after the XHTML tags they generate – without the underscore I got a name clash with something existing.

Minor complications where the Contacts folder contains Distribution Lists (class DistListItem), which is a different class from ContactItem – that is, a MAPIFolder.Items is a heterogeneous Collection. This makes the For Each loop throw a type mismatch error, although you can resume execution (>Debug, >Run).

Also a slight issue with duplicate filenames, since the file output overwrites by default. I used Round(Rnd * 1000, 0) to generate a temporary disambiguating suffiix for the filename.

Here’s the Class Module for ContactPage.

 

Diff for Access Tables

I haven’t had time to progress the Outlook Contact Export, so here is a minor digression.

Suppose that I have two Access tables each containing records that represent the same entities – Orders, say. These two sets of equivalent records come from different sources: maybe Orders as received, and Orders as dispatched. I want to cross-check certain fields for equivalent records – say, that order-quantity received and order-quantity dispatched are the same – and flag any mismatches.

Here are two tables:

I want to check the equivalence of columns B and C. There are differences in B/C for the second and fourth records, and an extra record in the second table.

Note that the IDs are counters, but are not necessarily the same, so I can not do a Join of the two tables. The tables are assumed to represent equivalent sets of records – maybe for the same period – so they could be queries on underlying cumulative tables.

It’s not apparent to me how you could do this using queries. If we could join the tables together, then for numerical values I suppose that we could subtract one field from the other, and check for non-zero differences.

However, a more generic solution is to suck the relevant columns into two Recordsets, using a simple SELECT query as the Source, and then iterate through records and fields, comparing values. Differences can be logged in some way:

  • If we expect only rare differences, then we could build up a log String, and display this at the end of the check:
  • If we expect substantial differences, we could base an output Recordset on a temporary table, and write message-records into it.

I’ve adopted the first approach, for simplicity.

Here’s a code module, and the database (2003 .mdb) with test tables. There’s no user interface: RunDiff is run directly from within VBA.

Moving Data Out of Outlook

I’ve felt uneasy for a while that important data – for example, my Contacts – is sitting in Outlook, in an enormous and opaque database file (.pst). Okay, it’s backed up, but if it became ‘corrupted’ and wouldn’t open, then I have no way of recovering the data, that I know of. Why should I have to open Outlook, just to find someone’s website, or phone number? The whole idea of organizing your life within a single application seems terribly outmoded. I just want a mail client that handles message send and receipt; all the information I want to be in the ‘outside world’.

Retrieving data can be pretty painful too. I’ve just clicked on the ‘Activities’ tab of a contact, and it’s taken over 2 minutes to come back with about 20 items. While there is free-text search, there is no proper query access.

Outlook’s data model is baroquely large – a futile attempt to cover all possibilities. I’m looking at the class ContactItem, and it has nearly 150 attributes. I can have NickName, TelexNumber (telex?) and YomiFirstName (something Japanese); but I can only have three email addresses and one webiste URL.

So what’s the alternative? What I’d like is for each item (contact, whatever) to be a web page in a local (private) website. New pages could be created, edited and added as necessary. The information is all fully accessible using basic or more sophisticated editors. Website and mailto links work normally – the latter opening a mail client, as usual.

As it happens I’m using a document repository for XML documents, including those in XHTML. This runs on a Jetty server, at a localhost port. The repository can be browsed as a directory hierarchy, free-text searched, and queried (using XQuery). Because the repository is fully indexed, the search and query are very fast. I can create/edit documents in the repository directly, without the need to export/import, using a WebDAV-enabled editor.

So, to populate the repository initially, I would need to export my existing contacts from Outlook in some way. Exporting items as XML or HTML directly is beyond Outlook (up to and including v2007). So I’m thinking that I could write a custom export to XHTML – nothing complicated, just a simple page with tables. Elements, such as the table cells, would be given standard id attributes to indicate the content type – such as ‘company’, ‘address’; these can then be used in queries, analogously to column names in SQL.

In terms of the design, I’m wondering about a custom class, say ContactAdapter, which would wrap a ContactItem, and build the complete XHTML string, and write it to a file (using the Scripting library for the file access). I’ve not done anything like this before, so we’ll see. I suppose that I could export the data to Excel, and work on it there, but dealing with rows and columns is less direct than working with the actual ContactItem objects in Outlook.

More anon…

‘Database’ Functions in Excel

In Excel there is a category of functions called “Database”. This has always been a confusing term, as now (in 2007+) they relate to Tables (a.k.a. ListObjects, in VBA), or equivalent Ranges. The function names all start with ‘D’ (for database): DCOUNT, DSUM, etc. These are equivalents of the standard aggregation functions COUNT, SUM, etc (by aggregation, I mean a function that takes a set of values, and returns a single value).

What these D-functions allow is selective aggregation of data from a table, given a set of criteria – in other words, the combination of an advanced filter with an aggregation, without the need for the filtered data to reside on a worksheet.

Here’s an example:

The simple case is where the criteria are directly related to the data in the table. So, to count Bob’s records, the criteria range is B2:B3, and the DCOUNT formula is in B5:

=DCOUNT(Database,SalesTable[[#Headers],[Contacts]],B2:B3)

The reference to “SalesTable[[#Headers],[Contacts]]” is just because I clicked on D9 – I could have just put “Contacts” (but I hate typing :-)). Note that the blank headers in row 2 are required as part of the criteria range (row 1 is just labels). Unsurpisingly, there are 12 records – one for each month of the year.

However, this kind of subtotalling by ‘dimension member’ (to use the OLAP term) is what pivot tables do. More interesting is when the criteria involve a Boolean-returning formula applied to each record.

For example, suppose that we want to know how many reps had more than average number of contacts (yes, I know it will be about half). So in D7 we have:

=AVERAGE(SalesTable[Contacts])

and in D3:

=D10>$D$7

The criterion is a formula referring to the first data record, which returns a Boolean. I think of this formula as being filled down through the table records, as if in an additional column (note the absolute/relative addresses).

Even more interesting is when the criteria involve functions on multiple fields in a record. For example, suppose that we are interested in records where Contacts2 is greater than Contacts - these are the guys who are improving.

So in E3, we have:

=E10>D10

(again, referring relatively to the first record).

This is fine, but more generally, our Boolean function operates on the entire record. So in F3, we have:

=TestRecord(A10:E10)

where TestRecord is a pure VBA function:

Public Function TestRecord(rec As Range) As Boolean

    TestRecord = rec.Item(4) < rec.Item(5)

End Function

Clearly, the body of this function can be as complicated as we wish, using the cells in rec. However, it depends on a particular ordering of the columns within the table. It is possible that we would want to use this function on tables that have the Contacts columns in different positions. So, an improved version intersects the named columns with the given rec.

Public Function TestRecord2(rec As Range) As Boolean

    Dim table As ListObject
    Set table = rec.ListObject

    Dim arg1 As Range
    Set arg1 = Intersect(table.ListColumns("Contacts").Range, _
                             rec)

    Dim arg2 As Range
    Set arg2 = Intersect(table.ListColumns("Contacts2").Range, _
                             rec)

    TestRecord2 = arg1.Value &lt; arg2.Value
End Function

Since the supplied Range is the first record in the table, we could simply pass the Table name to the function and derive from that the Range for the first record (but it’s getting late…).

Constraints on the Database functions are:

  • the criteria have to be Ranges (and thus on a Worksheet), not in-code arrays
  • they have to be vertically-oriented, contiguous Ranges (so can not be filled down).

A somewhat more esoteric limitation is that you can not plug a custom aggregation function into the basic D-function mechanism – DWEIRDSUM, perhaps.

Period Overlap Function

Over at Daily Dose of Excel, Dick provided a solution to the problem of counting active entities (e.g. customers) within a given period. That is, we want to count a customer if their period of activity overlaps with the given period. The key to the solution is SUMPRODUCT.

An alternative solution, which is a bit more work, but is more flexible, is to write a custom VBA function to compare two periods: that is, start1-finish1 versus start2-finish2.

It is then easy to add an extra column:

PFirst and PLast specify the period of interest. The formula in D6 is:

=OVERLAPS(PFirst,PLast,B6,C6)

You might be tempted to write [First] and [Last], rather than B6 and C6. Unfortunately the implicit intersection of formula row with a table column does not work with the custom function – the function gets arrays of values from the whole columns.

However, we could have the OVERLAPS formulas off to the right of the table, say in column H. The formula can then be:

=OVERLAPS(PFirst,PLast,
Table1[[#This Row],[First]],
Table1[[#This Row],[Last]])

(Yes, I should have given the table a meaningful name…)

The formula in D3 is just:

=COUNTIF(Table1[Overlaps],TRUE)

We can also use the OVERLAPS function to provide conditional formatting – here in column A (see screenshot above). The formula is exactly the same as in column D:

=OVERLAPS(PFirst,PLast,B6,C6)

Finally, here’s the code for the function:

Public Function OVERLAPS(start1, finish1, start2, finish2) As Variant
'Does the period start1-finish1 overlap
'with the period start2-finish2?
'pre: start1 <= finish1 and start2 <= finish2 

  If IsDate(start1) And IsDate(finish1) And _
     IsDate(start2) And IsDate(finish2) Then
        OVERLAPS = _
        (start2 >= start1 And start2 <= finish1) Or _
        (finish2 >= start1 And finish2 <= finish1)
  Else
        OVERLAPS = CVErr(2001)
  End If
End Function

The error appears on the worksheet as a #VALUE, rather than a #N/A – I didn’t look into this deeply.

Navigating Part Relationships – 2

In the previous post, I introduced the idea of a cursor object that allows you to navigate around a graph of component-part relationships:

Navigation could be:

  • down the ‘contains’ relationships
  • along the ‘used in’ relationships
  • back through the history of visited records.

We’ll also have a Reset operation, which jumps back to the start of the table, and clears the history.

The navigation is done using keyboard shortcuts (but could be done via a form).

The core of the design is a Class Module GraphCursor. This provides our four operations: CursorDown, CursorNextUse, CursorBack and CursorReset. When an instance of this class initializes, it points itself at ListObjects(1) on the ActiveSheet (there is only one sheet, to keep things simple), and does a CursorReset.

A GraphCursor maintains a history of visted components using a linked List class (a simple chain of ListItem objects – nothing to do with ListObject a.k.a. Table).

CursorDown and CursorNextUse use Range.Find with the currently selected cell value. I assume this is pretty efficient – and in any case is neater in code terms than iterating through rows explicitly. The Range for CursorDown is just the first column (Component); the Range for CursorNextUse is the Part columns below the row of the current selection.

Something needs to create and hold on to an instance of GraphCursor – this is a standard module Client. This also provides procedures that are called via the keyboard shortcuts.

Public gc As New GraphCursor

Sub GCDown()
    gc.CursorDown
End Sub

'similarly for the other three operations

The keyboard shortcuts are set up on Workbook_Open:

Private Sub Workbook_Open()
    Application.OnKey "^+d", "GCDown"
    Application.OnKey "^+n", "GCNextUse"
    Application.OnKey "^+r", "GCReset"
    Application.OnKey "^+b", "GCBack"

    GCReset
End Sub

Here’s the workbook (Excel 2007).

Since each navigation step is worked out dynamically, we can insert or delete records from our table as we like. This would not be the case for an indexed solution (maintaining a map of Component to row number).

You could argue that each Component-Part relationship should be a separate record – for example, [A, B], [A, C]. This would allow us to associate quantities or other information with each relationship. In this case, we would also need a CursorNextPart operation.

Navigating Part Relationships

This is something I was thinking about before a short holiday break (Strasbourg – very pretty) …

Suppose that we have an inventory of Components, perhaps in a manufacturing or engineering context. For each Component, we record its Parts (that is, sub-components). For example:

Sorry, my creativity did not extend to dreaming up meaningful data – think of car parts, or something.

So, an A is a top-level component, and consists of Bs and Cs (we do not specify quantities – an issue deferred); a B consists of Fs and Gs; E, G and H are elementary components (indicated by underscore). A component, such as B or H, can be used by multiple higher-level components. The restriction to 3 part types per component is just for simplicity.

These relationships form a directed, acyclic graph (there are top-level and bottom-level components, and no component can contain itself directly or indirectly).

Even with a very simple example, like above, it is quite difficult to see what contains what. With a realistically large inventory, say hundreds or thousands of records, it would be next to impossible.

One solution would be to support navigation around the graph of relationships:

  • Drill-down the ‘contains’ relationship: for example, [A contains B, C] >>> [B contains F, G] >>> [F contains H] >>> [H is elementary]
  • Move along the ‘used in’ relationship: for example, [B used in A] >>> [B used in D] >>> [no more uses of B]

Moving along the ‘used in’ relationship for ‘_’ (underscore) takes us through all the elementary components.

When you get the end of a ‘used in’ chain, you could have the option to start again from the beginning (as in textual searches):

In addition, we could have navigation ‘back’ through the sequence of visited Components.

We could offer these operations and the resulting ‘current record’ to a user via a form. However, it seems more lightweight to use keyboard shortcuts.

So, how might we do this? As previous postings have illustrated, I like the idea of a cursor object, which provides move operations of some kind, and access to the ‘current record’. In this case, the cursor position will be indicated visibly to the user by explicitly Selecting the relevant cell (and thus scrolling to the record).

An instance of the cursor object can be created on Workbook_Open, and linked on Class_Initialize to a the table. For simplicity, let’s assume that we have a single worksheet with a single table. As usual, there are advantages to using a v2007 Table (in VBA a ListObject), but you could get it to work in v2003.

More on the design/code in the next posting.

Multi-Area Ranges

We tend to think of a Range as a contiguous set of cells: a single cell, part (or all) of a column, part (or all) of a row, or a rectangular area. However, a Range is one or more or these. The contiguous areas of a Range are held in its Areas collection. If Areas.Count > 1 then we have a multi-area Range.

To construct a multi-area Range from several single-area ranges, you can use the Union operation, which is provided by the Application (so, effectively, global).

I came across an example of this in customizing a chart to show only the most recent N records in the source data. Here’s an accumulating table of monthly records:

I want a clustered column chart, with Months as the category (X) axis, and In/Out as the series.

If I wanted data from all records, I would set the chart’s source data range to be the entire table, including the headers, which Excel interprets to give the series names (that is, In and Out).

If I want my chart to show only the most recent 3 months (say), then I need to specify the source data range as the union of A1:C1 and A6:C8. Here’s the Source Data Range as it appears in the chart configuration:

=’Data 2010′!$A$1:$C$1,’Data 2010′!$A$6:$C$8

Here’s a procedure that constructs and sets the range for a given number of months:

Sub LastNMonths(nmonths As Integer)
'pre: 0 < nmonths
'  <= Range("A1").CurrentRegion.Rows.Count - 1

    Dim ws As Worksheet
    Dim firstrow As Integer, lastrow As Integer

    Set ws = Worksheets("Data 2010")

    lastrow = ws.Range("A1").CurrentRegion.Rows.Count
    firstrow = lastrow - nmonths + 1

    Dim headers As Range
    Set headers = ws.Range("A1:C1")

    Dim rng As Range
    Set rng = ws.Range("A" & firstrow & ":C" & lastrow)

    Set rng = Union(headers, rng)

    Dim cht As Chart
    Set cht = Charts("Chart 2010")
    With cht
        .SetSourceData rng
        .ChartTitle.Text = _
               MonthsTitle(ws, firstrow, lastrow)
    End With
End Sub

MonthsTitle simply constructs the chart title from the first and last months (from column A) – for example, “Report May-10 to Jul-10″.

As I did this initially in v2003, I have not made use of the ‘table-ness’ of the data – hence the Range(“A1″).CurrentRegion.

The procedure was called from a ‘Refresh Chart’ command button on a worksheet, with the nmonths value provided via an InputBox.

Before anyone points this out, let me say that you can do a certain amount of this kind of thing using formulas in the Chart configuration (using OFFSET, etc) – see Walkenbach’s ‘Formulas’ book. However, I think it’s clearer, and certainly more flexible to do it in VBA.

Formula Builder

Suppose that we have a large number of numerical data sets, which we want to examine using aggregation functions, such as SUM, AVERAGE, STDEV, and possibly some custom functions of our own devising. Each of these functions takes an entire dataset and retuns a single result.

Here’s the data:

where ‘data1′, ‘data2′, etc, name the dataset ranges; the column headers are themselves named ‘datasets’. We could, of course, have larger and more numerous datasets.

Now, we could have another worksheet with formulas such as:

=SUM(data1), =SUM(data2), …

= STDEV(data1), = STDEV(data2), …

and so on, for each combination of function and dataset. However, that’s rather a sledgehammer approach.

Alternatively, we could select a function from a drop-down list, select a dataset from another list, and so build a particular formula of interest:

The list of available functions is defined elsewhere, and picked up by data validation in cell B1:

(SUMDIFFS is a sample custom function). Similarly, the data validation on cell D1 picks up the ‘datasets’ list (the data column headers).

The result cell (F1) contains the formula:

=CALC()

where CALC is a public VBA function:

Public Function CALC() As Variant

  Dim func As String
  Dim dataset As String
  Dim formula As String

  func = Range("function").Value
  dataset = Range("dataset").Value
  formula = func & "(" & dataset & ")"

  CALC = Evaluate(formula)
End Function

The formula String does not need an initial ‘=’.

As CALC does not reference its input cells explicitly, it does not get recalculated automatically if the input cells change. So we need to handle Change events that emanate from the two input cells. In the ‘Analysis’ Sheet module:

Private Sub Worksheet_Change(ByVal Target As Range)

  If Target.Address = Range("function").Address Or _
     Target.Address = Range("dataset").Address Then
       Range("result").Dirty
  End If
End Sub

I’ve always wanted use the ‘Dirty’ method … This is equivalent to re-entering the =CALC() formula, triggering recalculation.

A constraint here is that the functions must form a family that have the same argument signature: here, a single array of values. Other statistical functions take more arguments, and might form other families. Alternatively, a multi-argument built-in function could be wrapped in a custom function that defaulted all but the main ‘data’ argument.

Handling Worksheet_Change for a Named Value

I was recently thinking about an example of handling the Worksheet_Change event. Unlike, say, opening a Workbook or clicking an ‘OK’ button, this is a ‘background’ event, which happens many times during an editing session. So we need to be quite specific about which changes we’re interested in. An obvious test is to check whether the name of the target Range is relevant.

An example that springs to mind is keeping a log of changes to some key value: something like a tax rate, where the act of changing the value is itself of interest. It is also important that transaction records pick up the appropriate rate for their date, not the latest value.

Here is a simple Orders table:

The various Orders pick up their tax rate from a Rate table:

The Rate column in the Orders table has the formula:

=VLOOKUP([Date],RateTable[#Data],2,TRUE)

When the Tax value (top-right of the first screenshot) is edited, a new row is added to the Rate table.

We can do this by handling Worksheet_Change on the Orders Worksheet:

Private Sub Worksheet_Change(ByVal Target As Range)
'If Target is the named Range "Tax",
'then extend the change log
'on Worksheet "Tax Rates",
'with today's date and the new value.

The easiest (only?) way of checking the Range Name is to have a go, and handle the error if it is unnamed:

    On Error GoTo finish
            'If Target does not have a Name,
            'then Target.Name raises an error
            'So just ignore the event.
    Dim rngname As String   'Possible name of Target
    rngname = Target.Name.Name

If Target does have a name, and that name is ‘Tax’, then add a row to the Rate table:

    If rngname = "Tax" Then
        Dim newrow As ListRow

        Set newrow = _
        Worksheets("Tax Rates").ListObjects(1).ListRows.Add

        newrow.Range.Item(1).Value = Date
        newrow.Range.Item(2).Value = Target.Value

    End If
finish:
End Sub

The use of the Date function ensures that the first column of the Rate table is sorted ascending, which is required by the VLOOKUP. If you make more than one change of Rate on a given date (for example, you correct a mistake), the VLOOKUP picks up the latest value. So if today (24/6) I make a further change from 20% to 21% (I blame the fiscal squeeze), then this rate is picked up by today’s orders.


July 2014
M T W T F S S
« Dec    
 123456
78910111213
14151617181920
21222324252627
28293031  

Follow

Get every new post delivered to your Inbox.