Posts Tagged 'Software'

Moving Data Out of Outlook – part 2

Happy and code-full 2011!

In an earlier post, I was considering exporting my Outlook contacts as XHTML pages, and accessing these as a repository of free-format, but queryable, documents. After some time out for Christmas and New Year, not to mention my family’s seasonal cough-fest, I can report back.

I haven’t attempted a comprehensive export of all contact attributes, just the standard ones that I typically use. This still leaves information in the Contact’s body section, but I’ll migrate that manually, as and when.

Here’s an example contact page:

I just grabbed a standard style sheet, so the formatting is not up to much (for example, the cell text middle-aligns vertically, as you can see with “Address”).

In terms of the VBA code, all we need is an adapter class, ContactPage. Instances of this are created for each ContactItem that we export. A ContactPage is responsible for building the XHTML string corresponding to the ContactItem, and writing it out to a text file. Here’s the root procedure:

Const outdir As String = "C:\Temp\Contacts"

Sub ExportContacts()
  Dim allcontacts As Items
  Set allcontacts = Session.Folders("Personal Folders"). _
                             Folders("Contacts").Items

  MsgBox "Exporting " & allcontacts.Count & " Contacts"

  Dim cn As ContactItem  'Outlook class
  Dim cp As ContactPage  'Custom class
  For Each cn In allcontacts
    Set cp = New ContactPage
    Set cp.Contact = cn    'Property Set
    cp.OutputToFile outdir
  Next
End Sub

ContactPage has a function for PageTitle, which is taken from the ContactItem‘s Company or Full Name (in that order of preference). Functions PageId and PageFile are based on this (with a bit of cleaning up to remove characters such as ‘/’).

The sub OutputToFile uses the Microsoft Scripting Runtime library, specifically FileSystemObject and TextStream. You need to load the Scripting library manually, via >Tools >References.

The rest of ContactPage’s features are just functions that generate elements in the XHTML hierarchy. For example:

Public Function XhtmlPage() As String
'The XHTML representation of cp_contact
    XhtmlPage = Header_ & vbCrLf & _
                Html_
End Function

'...

Public Function Html_() As String
    Html_ = "<html id=" & Chr(34) & PageId & Chr(34) '& ...
    Html_ = Html_ & vbCrLf & _
            Head_ & vbCrLf & _
            Body_ & vbCrLf & _
            "</html>"
End Function

Public Function Body_() As String
    Body_ = "<body>" & vbCrLf & _
            "<h1>" & PageTitle & "</h1>" & vbCrLf & _
            Company_table_ & vbCrLf & _
            Person_table_(1) & vbCrLf & _
            Person_table_(2) & vbCrLf & _
            Person_table_(3) & vbCrLf & _
            Categories_ul_ & vbCrLf & _
            "</body>"
End Function

'...

Public Function LastName_td_() As String
    LastName_td_ = "<td class=" & Chr(34) & "LastName" & Chr(34) & ">" & _
     cp_contact.LastName & "</td>"
End Function

The lowest level functions, typically for <td> elements, access the ContactItem‘s properties. As you can see, there are a lot of quotation characters (Chr(34)) in XHTML. The trailing-underscore convention for the function names was so that I could call the functions after the XHTML tags they generate – without the underscore I got a name clash with something existing.

Minor complications where the Contacts folder contains Distribution Lists (class DistListItem), which is a different class from ContactItem – that is, a MAPIFolder.Items is a heterogeneous Collection. This makes the For Each loop throw a type mismatch error, although you can resume execution (>Debug, >Run).

Also a slight issue with duplicate filenames, since the file output overwrites by default. I used Round(Rnd * 1000, 0) to generate a temporary disambiguating suffiix for the filename.

Here’s the Class Module for ContactPage.

 

Navigating Part Relationships – 2

In the previous post, I introduced the idea of a cursor object that allows you to navigate around a graph of component-part relationships:

Navigation could be:

  • down the ‘contains’ relationships
  • along the ‘used in’ relationships
  • back through the history of visited records.

We’ll also have a Reset operation, which jumps back to the start of the table, and clears the history.

The navigation is done using keyboard shortcuts (but could be done via a form).

The core of the design is a Class Module GraphCursor. This provides our four operations: CursorDown, CursorNextUse, CursorBack and CursorReset. When an instance of this class initializes, it points itself at ListObjects(1) on the ActiveSheet (there is only one sheet, to keep things simple), and does a CursorReset.

A GraphCursor maintains a history of visted components using a linked List class (a simple chain of ListItem objects – nothing to do with ListObject a.k.a. Table).

CursorDown and CursorNextUse use Range.Find with the currently selected cell value. I assume this is pretty efficient – and in any case is neater in code terms than iterating through rows explicitly. The Range for CursorDown is just the first column (Component); the Range for CursorNextUse is the Part columns below the row of the current selection.

Something needs to create and hold on to an instance of GraphCursor – this is a standard module Client. This also provides procedures that are called via the keyboard shortcuts.

Public gc As New GraphCursor

Sub GCDown()
    gc.CursorDown
End Sub

'similarly for the other three operations

The keyboard shortcuts are set up on Workbook_Open:

Private Sub Workbook_Open()
    Application.OnKey "^+d", "GCDown"
    Application.OnKey "^+n", "GCNextUse"
    Application.OnKey "^+r", "GCReset"
    Application.OnKey "^+b", "GCBack"

    GCReset
End Sub

Here’s the workbook (Excel 2007).

Since each navigation step is worked out dynamically, we can insert or delete records from our table as we like. This would not be the case for an indexed solution (maintaining a map of Component to row number).

You could argue that each Component-Part relationship should be a separate record – for example, [A, B], [A, C]. This would allow us to associate quantities or other information with each relationship. In this case, we would also need a CursorNextPart operation.

Multi-Area Ranges

We tend to think of a Range as a contiguous set of cells: a single cell, part (or all) of a column, part (or all) of a row, or a rectangular area. However, a Range is one or more or these. The contiguous areas of a Range are held in its Areas collection. If Areas.Count > 1 then we have a multi-area Range.

To construct a multi-area Range from several single-area ranges, you can use the Union operation, which is provided by the Application (so, effectively, global).

I came across an example of this in customizing a chart to show only the most recent N records in the source data. Here’s an accumulating table of monthly records:

I want a clustered column chart, with Months as the category (X) axis, and In/Out as the series.

If I wanted data from all records, I would set the chart’s source data range to be the entire table, including the headers, which Excel interprets to give the series names (that is, In and Out).

If I want my chart to show only the most recent 3 months (say), then I need to specify the source data range as the union of A1:C1 and A6:C8. Here’s the Source Data Range as it appears in the chart configuration:

=’Data 2010′!$A$1:$C$1,’Data 2010′!$A$6:$C$8

Here’s a procedure that constructs and sets the range for a given number of months:

Sub LastNMonths(nmonths As Integer)
'pre: 0 < nmonths
'  <= Range("A1").CurrentRegion.Rows.Count - 1

    Dim ws As Worksheet
    Dim firstrow As Integer, lastrow As Integer

    Set ws = Worksheets("Data 2010")

    lastrow = ws.Range("A1").CurrentRegion.Rows.Count
    firstrow = lastrow - nmonths + 1

    Dim headers As Range
    Set headers = ws.Range("A1:C1")

    Dim rng As Range
    Set rng = ws.Range("A" & firstrow & ":C" & lastrow)

    Set rng = Union(headers, rng)

    Dim cht As Chart
    Set cht = Charts("Chart 2010")
    With cht
        .SetSourceData rng
        .ChartTitle.Text = _
               MonthsTitle(ws, firstrow, lastrow)
    End With
End Sub

MonthsTitle simply constructs the chart title from the first and last months (from column A) – for example, “Report May-10 to Jul-10″.

As I did this initially in v2003, I have not made use of the ‘table-ness’ of the data – hence the Range(“A1″).CurrentRegion.

The procedure was called from a ‘Refresh Chart’ command button on a worksheet, with the nmonths value provided via an InputBox.

Before anyone points this out, let me say that you can do a certain amount of this kind of thing using formulas in the Chart configuration (using OFFSET, etc) – see Walkenbach’s ‘Formulas’ book. However, I think it’s clearer, and certainly more flexible to do it in VBA.

Formula Builder

Suppose that we have a large number of numerical data sets, which we want to examine using aggregation functions, such as SUM, AVERAGE, STDEV, and possibly some custom functions of our own devising. Each of these functions takes an entire dataset and retuns a single result.

Here’s the data:

where ‘data1′, ‘data2′, etc, name the dataset ranges; the column headers are themselves named ‘datasets’. We could, of course, have larger and more numerous datasets.

Now, we could have another worksheet with formulas such as:

=SUM(data1), =SUM(data2), …

= STDEV(data1), = STDEV(data2), …

and so on, for each combination of function and dataset. However, that’s rather a sledgehammer approach.

Alternatively, we could select a function from a drop-down list, select a dataset from another list, and so build a particular formula of interest:

The list of available functions is defined elsewhere, and picked up by data validation in cell B1:

(SUMDIFFS is a sample custom function). Similarly, the data validation on cell D1 picks up the ‘datasets’ list (the data column headers).

The result cell (F1) contains the formula:

=CALC()

where CALC is a public VBA function:

Public Function CALC() As Variant

  Dim func As String
  Dim dataset As String
  Dim formula As String

  func = Range("function").Value
  dataset = Range("dataset").Value
  formula = func & "(" & dataset & ")"

  CALC = Evaluate(formula)
End Function

The formula String does not need an initial ‘=’.

As CALC does not reference its input cells explicitly, it does not get recalculated automatically if the input cells change. So we need to handle Change events that emanate from the two input cells. In the ‘Analysis’ Sheet module:

Private Sub Worksheet_Change(ByVal Target As Range)

  If Target.Address = Range("function").Address Or _
     Target.Address = Range("dataset").Address Then
       Range("result").Dirty
  End If
End Sub

I’ve always wanted use the ‘Dirty’ method … This is equivalent to re-entering the =CALC() formula, triggering recalculation.

A constraint here is that the functions must form a family that have the same argument signature: here, a single array of values. Other statistical functions take more arguments, and might form other families. Alternatively, a multi-argument built-in function could be wrapped in a custom function that defaulted all but the main ‘data’ argument.

Sub-sequence Iterator 3

In the previous post, we were looking at an iterator object that bound to successive sub-sequences of records in a table, according to a StartCondition and an EndCondition. In that example, the sub-sequences were contiguous, non-overlapping, and so could be summarized to give sub-totals in a very flexible way.

A slightly different example is where the start and end records for each sub-sequence are all intermixed. Here’s a time-ordered sequence of start and finish actions, for some identified entities:

You can see that B and C both start and finish before A finishes.

Suppose that want to find out the duration of each activity. We then need to find matching start and finish records, and calculate the difference between the two dates. In this case, the sub-sequence is the ‘lifetime’ of the entity. Although we are not interested in anything other than the start and finish for an entity, it’s conceivable in other examples that we might be interested in the intermediate values.

In the previous example, the sub-sequences were non-overlapping, so the MoveNext operation started looking for a new StartRow immediately after the old EndRow. Here, the sub-sequences can overlap, so the MoveNext operation starts looking for a new StartRow immediately after the old StartRow . At present, each entity is assumed to have a finish record.

StartCondition simply looks for a ‘start’ record; when it finds one, it records the Id. EndCondition looks for a ‘finish’ record with the current Id.

The client code creates an iterator on the Actions table (see above), moves it through the table, calculates the duration for each entity (finish – start + 1), and writes a summary record:

Here’s the client procedure:

Sub Run()
    Dim ssi As SSI2
    Set ssi = New SSI2
    With ssi
        .WsName = "Actions"    'Worksheet
        .TableName = "ActionTable"
        .KeyName1 = "Id"       'Key column
        .KeyName2 = "Action"   'there are 2 key columns
        .ValName = "Date"      'Value column
    End With

    Worksheets("Summary2").Activate  'for the output
    ActiveSheet.Cells.Clear

    Dim row As Long
    row = 1
    ssi.MoveNext     'to first sub-sequence
    Do Until ssi.IsAfter
        Range("A" & row).Value = ssi.Label
        Range("B" & row).Value = Summarized(ssi.values)
        row = row + 1
        ssi.MoveNext
    Loop
End Sub

Private Function Summarized(values As Variant) As Variant
'pre: IsArray(values)
    Dim last As Long
    last = UBound(values)
    Summarized = values(last) - values(1) + 1
End Function

I think this is quite an interesting example, as it’s not obvious to me how you would do it with formulas, even array ones.

At present there are two different classes for the two examples (non-overlapping and overlapping) – hence SSI2, above. Maybe they should both implement a common interface. In practice, though I think you’d know which flavour you needed. Here’s the workbook with both examples, if you want to try it out.

Sub-sequence Iterator 2

Following on from the previous post, I’ve had a go at implementing a Sub-sequence Iterator (SSI) class, which operates on a named Table ( a 2007 table, a.k.a. ListObject). There are two variants, depending on whether the sub-sequences can overlap or not.

The first case is similar to the earlier data partitioning, where we want to break our table rows into contiguous, non-overlapping sub-sequences, and perform some summarizing operation over the values – for example, subtotalling. Here’s the data table:

The Date column is our ‘key’, by which we work out the sub-sequences. In this case it’s just a weekly incrementing date. The Value column contains the data that we want to summarize. For simplicity, let’s say that we want to summarize by month – although it could be something more interesting.

Our SSI object has a MoveNext operation, called by some client code, which makes it iterate through the sub-sequences. For each sub-sequence, the SSI makes available to the client code:

  • a Label that can be used to identify each sub-sequence
  • an array of Values for the sub-sequence.

The client code can then pass the Values to a function Summarized, which in this case calculates a (sub)total. The client code then writes the Label and the Summarized value to another worksheet:

Here, the Label is just “Month ” prepended to the month number (January -> 1, etc).

Now, you might be wondering why we don’t just add a Month column to our table, and generate a pivot table, aggregating by month.

Firstly, we might want to partition our dates in many different ways: by month, quarter, Mayan Lunar Year, and so on. It would be cumbersome to have to add a column, with appropriate values, for each of these partitionings. Indeed, our table might be linked to some external data source which has only the raw data.

Secondly, a particular partitioning might depend on the data values themselves (for example, a negative value terminates a sub-sequence), or some dynamic value, such as today’s date.

To customize the SSI class for a particular table, we need to write:

  • Function StartCondition(row As Long) As Boolean
  • Function EndCondition(row As Long) As Boolean
  • Function Label() As String

In this example, StartCondition and EndCondition are both looking for changes in Month. When the SSI finds new rows satisfying these conditions, it sets StartRow and EndRow, which then delimit a new array of Values.

I’ll talk about the second variant – where we can have overlapping subsequences – in the next post.

Partitioning Data into Multiple Workbooks

I’ll return to the worksheets-to-table collation shortly…

I was diverted by a problem that my wife came up with: how to partition data in a single workbook into multiple workbooks, depending on the value in a particular ‘key’ column. Here’s the kind of thing we’d have as the input worksheet:

The data is sorted on the first, Key, column. There could be multiple data columns.

What we want is to create a workbook AAA.xls, containing the first 7 data values; a workbook BBB.xls, containing the next 3 data values; a workbook CCC.xls, containing the last 5 data values. And so on.

Obviously the numbers of workbooks and values involved could be large – so we’re not going to do this using copy-and-paste (and this is a VBA blog, after all!). Incidentally, this is in v2003, not that it matters.

So we need some way of finding where the Key values change. We could just iterate through the cells, but that’s rather plodding. We have, in the normal user interface, the command >Edit >Go To Special >Column Differences. If we start off with A:A selected, it changes the selection to A2:A16, then to A9:A16, and so on. If we do it on the last block, with A12:A16 selected, we get an error.

The key bit of VBA is:

Selection.ColumnDifferences(ActiveCell).Select

where ActiveCell is the first cell of the selected Range. Selection.Row is then the start of a chunk. Doing it again gives us the start of the next chunk, and thus the end of the preceding chunk.

So we can just cycle through the chunks in a Do While True loop, exiting when we get an error on the last chunk (remembering to copy that last chunk).

I added a Control worksheet with a command button, and prompted for the source worksheet name (in case there are multiple ones). I didn’t copy over the header row for columns B+, but that wouldn’t be difficult to do.

That seems to do the trick: the code module is here, if you’re interested. There are doubtless other ways of doing this, but my solution got the required Brownie points from Liz.

Dynamic Combo Box List

A combo box on a form can pick up the list contents from its RowSource property. This can be the name of a worksheet range – such as “products” (not ever a literal address, please).

However, this is tricky if the size of the list changes (if the list grows, specifically). With a standard defined Name, you need to keep on redefining the Name. Alternatively, you can use the OFFSET-function trick for dynamic ranges (but that is obscure and depends on things like the number of non-data cells in the column).

In 2007 we have the Structured Reference naming into Tables, such as “ProductTable[Product]“, which returns us the data contents of the product column. On a recent course, I was asked if we could use this as a Combo box RowSource property. The good news is that we can:

Any additions to the Table are picked up when the form reopens:

You could get this to work for a multi-column list (e.g. id + name, with the id as BoundColumn). For a 2-column table, you can specify the whole data area: “ProductTable[#data]“. It doesn’t matter if the #data area has more columns than the list needs: the leftmost ones are used.

Initially, I thought I’d try this with an ActiveX combo box dropped directly on to the worksheet. However, such on-sheet controls are actually OLEObjects. Rather than a RowSource property, they have a ListFillRange. An OLEObject could be things other than a List/Combo box, in which case this property would be inapplicable. (Similarly, RowSource is actually defined on the Control class).

That’s okay, but unfortunately, OLEObject.ListFillRange does not understand structured references into tables: it needs either a regular Defined Name or (don’t) a literal address. This seems like an oversight: the OLEObject‘s Parent is the Worksheet, which holds the Table as a ListObject – so it should be possible to resolve the name.

Nevertheless, it works fine for the Form combo boxes, and I’ll try to use this technique in the future.

Excel – XML Workflow – part 2

In the previous post, we were round-tripping data between Job workbooks and Timesheet workbooks. The final link in the cycle was for each Job workbook to update its cumulative hours-booked value, from a single XML file JobHoursList.xml. Here’s an example of that file:

<?xml version='1.0' encoding='ISO-8859-1'?>
<jobhourslist>
  <jobhoursentry>
    <jobnumber>0001</jobnumber>
    <jobhours>91</jobhours>
  </jobhoursentry>
  <jobhoursentry>
    <jobnumber>0002</jobnumber>
    <jobhours>27</jobhours>
  </jobhoursentry>
</jobhourslist>

A Job workbook needs to find the jobhoursentry for its own jobnumber, and load the associated jobhours value.

I decided to load the file, and search it sequentially, using the MS XML library (I went for Microsoft XML v6.0, which is called MSXML2 – I don’t know whether the version makes any difference).

The Workbook_Open handler calls a Load procedure. The whole file is loaded as a MSXML2.DOMDocument:

    Set doc = New MSXML2.DOMDocument
    With doc
        .async = False
        .validateOnParse = False
    End With
    ok = doc.Load(ActiveWorkbook.Path + "\" + datafile)

The nodes are each of type MSXML2.IXMLDOMNode. The jobhourslist is the document (that is, root) element:

    Set jobhourslist = doc.DocumentElement

Then we iterate through the child (jobhoursentry) nodes:

    For Each jobhoursentry In jobhourslist.ChildNodes
      Set jobnumber = jobhoursentry.FirstChild
      Set jobhours = jobhoursentry.LastChild
      'check jobnumber against the one in this workbook
      'and if a match, update the jobhours cell
      '...

Surprisingly straightforward. Incidentally, “DOM” stands for Document Object Model – that is, the node-tree representation of an XML document.

Here’s the VBA module, if you’re interested.

Excel – XML Workflow

Following on from a previous suggestion about workbook initialization from XML data, I’ve had a go at implementing a bit of round-trip workflow, based on a time-reporting scenario.

Time is booked against currently active jobs; this data is totalled for each job, and updates the job specifications accordingly. I’m assuming this works on a regular cycle, say weekly. Here’s an overview:

A Job workbook contains:

  • Job Number
  • Job Name
  • Start Date
  • End Date
  • Total Hours booked.

There’s a job_Map, so that the data can be exported to a corresponding XML data file. This is done on the Workbook_BeforeClose event (arrow A).

The Job*.xml files are imported into an XML database (Qizx/Free Engine). We can then run a simple XQuery over the collection to generate the list of all active jobs – those that have a Start Date, but no End Date. This list is saved as joblist.xml (arrow Q1).

When a new Timesheet is created (from a template), the joblist.xml data is imported, and is mapped to an Excel table, which expands to hold the possibly multiple rows (arrow B).

The hours column is not mapped: that’s where the user enters the time data.

Now, we want to export jobnumber + hours for each timesheet entry. However, we can’t have jobnumber in two maps (input and output). My solution (there might be better ones) is to copy the data to another table on another worksheet.

To get the data across, I’m using the formula:

=Table_tslist[[#ThisRow],[jobnumber]]

where Table_tslist is the input table. The #VALUE! errors are because the table is fixed size.

The output table is mapped within timesheet_Map (along with tsname and tsperiod). We can now export the data on the Workbook_BeforeClose event (arrow C). The empty (#VALUE!) rows generate empty elements, which are just ignored.

The Timesheet*.xml files are imported into the database, and another XQuery is run to generate a list of (all) job numbers + total hours booked (arrow Q2).

The final link in the cycle is for a Job workbook to refresh its total hours data, when it is opened (arrow D). This is different from arrow B, in that it’s not a simple Import. Each Job workbook wants only the hours value for its own job number. There might well be various ways of doing this, but I’m doing a programmatic load-document-and-lookup, in VBA. I’ll leave this to the next post.

And it’s all working pretty neatly. Potentially, we could extract and summarize other information from the timesheets, and present this in the Job workbooks – for example, the latest period with booked time.


July 2014
M T W T F S S
« Dec    
 123456
78910111213
14151617181920
21222324252627
28293031  

Follow

Get every new post delivered to your Inbox.