Archive for January, 2011

Analysing Cross-references Between Documents

I have recently been working on a set of about 200 Use Case documents, converting them into DITA - an XML-based structured document format. In the body text of a Use Case, there can be cross-references to other Use cases – so we have a many-to-many relationship between Use Cases (just as we could between any type of document). It’s interesting to analyse this relationship, to see which Use Cases have lots of outgoing references, and which Use Cases are referenced by many other Use Cases.

This kind of totalling is Pivot Table territory. So the question arises of how to contruct the basic data about the cross-references – a table of A-references-B records – and how to get it into Excel.

A key advantage of XML-based documents is that you can import them into an XML database and run queries over them, where the queries are written in XQuery. Such queries can generate output as XML, HTML or plain text. For our purpose, a simple route is to output an HTML table. This can then be imported by Excel via >Get External Data >From Web. The URL for the import points to our XQuery file, within the database server:

http://roy-laptop:18080/qizx/xqs/ucrefs.xq

Here, the database is XMLMind’s Qizx, running in server mode (locally on my laptop, in this case). This URL can be used like any other in a browser interface, with the results of the query being displayed within the browser. In our case, we are going to give the URL to Excel.

Here’s the code of the query:

xquery version "1.0";
declare option output:method "html";

declare variable $ucs := collection("/UC");

declare function local:uc_refs()
{
  for $ucref in $ucs//uc-ref
  let $file := document-uri(root($ucref))
  let $ucreftext := $ucref/text()
  order by $file
  return
  <tr><td>{$file}</td><td>{$ucreftext}</td><td>1</td></tr>
};

let $ucrefs := local:uc_refs()
return
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <title>UC References</title>
  </head>
  <body>
    <table>
      <thead>
	    <tr><td>Source</td><td>Target</td><td>Token</td></tr>
      </thead>
      <tbody>
	    {$ucrefs}
      </tbody>
    </table>
  </body>
</html>

Variables start with $. The function collection (line 4) scopes the query to a particular directory tree of document files. The expression ‘$ucs//uc-ref’ binds to each occurrence of the <uc-ref> element, anywhere in the collection of documents. $file is the document in which a <uc-ref> is found. The third column of the table (Token) is just the number 1 – this gives me some data for the pivot table to sum.

Now we can do the import into Excel:

Finally, I can insert a pivot table based on this data, subtotalling by either Source or Target, or putting both dimensions on the same axis. I love it when stuff just works…

Out of curiosity, I ran the import with the macro recorder on, to see what the VBA looks like:

Sub Import()
    With ActiveSheet.QueryTables.Add( _
        Connection:= _
            "URL;http://roy-laptop:18080/qizx/xq/ucrefs.xq",
            Destination:=Range ("$A$1"))
        .Name = "ucrefs.xq"
        .FieldNames = True
        .RowNumbers = False
        .FillAdjacentFormulas = False
        .PreserveFormatting = True
        .RefreshOnFileOpen = False
        .BackgroundQuery = True
        .RefreshStyle = xlInsertDeleteCells
        .SavePassword = False
        .SaveData = True
        .AdjustColumnWidth = True
        .RefreshPeriod = 0
        .WebSelectionType = xlAllTables
        .WebFormatting = xlWebFormattingNone
        .WebPreFormattedTextToColumns = True
        .WebConsecutiveDelimitersAsOne = True
        .WebSingleBlockTextImport = False
        .WebDisableDateRecognition = False
        .WebDisableRedirections = False
        .Refresh BackgroundQuery:=False
    End With
End Sub

So it’s just creation of a QueryTable, with the URL given as a Connection string. The WebSelectionType property is ‘all tables’, although I thought I clicked the lower of the two yellow arrows in the Import dialog. Clearly, we could wrap this up a bit more: selecting from a list of queries, creating a new Worksheet, etc.

Having established the basic XML –> XQuery –> HTML Table –> Excel/VBA architecture, I think I’ll experiment more with this in due course.

Incidentally, this is my 100th post (Rah! Rah!), having taken almost exactly two years. And most enjoyable it’s been – at least for me :-)

An Alternative to VBA in Excel?

Back last year, Gareth Hayter of Slyce Software emailed me about VScript, an alternative to VBA for writing functions and macros in Excel. Unfortunately, for various reasons, I have not been able to look into it in any detail, but it certainly sounds interesting.

VScript is based on Excel-Dna, which is a project to integrate .NET into Excel. The aim is thus to allow you to write functions and macros in C# or VB.Net – so, presumably it is aimed at developers familiar with those languages, and preferring them over VBA. Code is claimed to run considerably faster than VBA (not hard to believe).  VScript provides an IDE integrated with Excel, as with VBA. You can use VScript to create stand-alone XLL add-ins, and digitally sign these.

What about VSTO? Gareth says:

VScript is different from VSTO (Visual Studio Tools for Office) in many ways:

  • VSTO is for professional programmers: It can be complex and confusing to use and requires a lot of time, effort and money to learn.
  • VSTO is expensive: You need to buy Visual Studio® which starts at $799.
  • VSTO is not built into Excel®: It is an external program and works in a very different way from VBA.
  • VSTO projects are complicated to deploy: VSTO is not integrated into Excel, which means that it’s difficult to make a few changes and test them quickly. It requires ClickOnce deployment.
  • VSTO cannot create User-Defined Functions (UDFs): With VSTO, you can’t create functions that you can use in a similar way to SUM() and AVG().

One to watch …

Moving Data Out of Outlook – part 2

Happy and code-full 2011!

In an earlier post, I was considering exporting my Outlook contacts as XHTML pages, and accessing these as a repository of free-format, but queryable, documents. After some time out for Christmas and New Year, not to mention my family’s seasonal cough-fest, I can report back.

I haven’t attempted a comprehensive export of all contact attributes, just the standard ones that I typically use. This still leaves information in the Contact’s body section, but I’ll migrate that manually, as and when.

Here’s an example contact page:

I just grabbed a standard style sheet, so the formatting is not up to much (for example, the cell text middle-aligns vertically, as you can see with “Address”).

In terms of the VBA code, all we need is an adapter class, ContactPage. Instances of this are created for each ContactItem that we export. A ContactPage is responsible for building the XHTML string corresponding to the ContactItem, and writing it out to a text file. Here’s the root procedure:

Const outdir As String = "C:\Temp\Contacts"

Sub ExportContacts()
  Dim allcontacts As Items
  Set allcontacts = Session.Folders("Personal Folders"). _
                             Folders("Contacts").Items

  MsgBox "Exporting " & allcontacts.Count & " Contacts"

  Dim cn As ContactItem  'Outlook class
  Dim cp As ContactPage  'Custom class
  For Each cn In allcontacts
    Set cp = New ContactPage
    Set cp.Contact = cn    'Property Set
    cp.OutputToFile outdir
  Next
End Sub

ContactPage has a function for PageTitle, which is taken from the ContactItem‘s Company or Full Name (in that order of preference). Functions PageId and PageFile are based on this (with a bit of cleaning up to remove characters such as ‘/’).

The sub OutputToFile uses the Microsoft Scripting Runtime library, specifically FileSystemObject and TextStream. You need to load the Scripting library manually, via >Tools >References.

The rest of ContactPage’s features are just functions that generate elements in the XHTML hierarchy. For example:

Public Function XhtmlPage() As String
'The XHTML representation of cp_contact
    XhtmlPage = Header_ & vbCrLf & _
                Html_
End Function

'...

Public Function Html_() As String
    Html_ = "<html id=" & Chr(34) & PageId & Chr(34) '& ...
    Html_ = Html_ & vbCrLf & _
            Head_ & vbCrLf & _
            Body_ & vbCrLf & _
            "</html>"
End Function

Public Function Body_() As String
    Body_ = "<body>" & vbCrLf & _
            "<h1>" & PageTitle & "</h1>" & vbCrLf & _
            Company_table_ & vbCrLf & _
            Person_table_(1) & vbCrLf & _
            Person_table_(2) & vbCrLf & _
            Person_table_(3) & vbCrLf & _
            Categories_ul_ & vbCrLf & _
            "</body>"
End Function

'...

Public Function LastName_td_() As String
    LastName_td_ = "<td class=" & Chr(34) & "LastName" & Chr(34) & ">" & _
     cp_contact.LastName & "</td>"
End Function

The lowest level functions, typically for <td> elements, access the ContactItem‘s properties. As you can see, there are a lot of quotation characters (Chr(34)) in XHTML. The trailing-underscore convention for the function names was so that I could call the functions after the XHTML tags they generate – without the underscore I got a name clash with something existing.

Minor complications where the Contacts folder contains Distribution Lists (class DistListItem), which is a different class from ContactItem – that is, a MAPIFolder.Items is a heterogeneous Collection. This makes the For Each loop throw a type mismatch error, although you can resume execution (>Debug, >Run).

Also a slight issue with duplicate filenames, since the file output overwrites by default. I used Round(Rnd * 1000, 0) to generate a temporary disambiguating suffiix for the filename.

Here’s the Class Module for ContactPage.

 


January 2011
M T W T F S S
« Dec   Mar »
 12
3456789
10111213141516
17181920212223
24252627282930
31  

Follow

Get every new post delivered to your Inbox.