Analysing Cross-references Between Documents

I have recently been working on a set of about 200 Use Case documents, converting them into DITA - an XML-based structured document format. In the body text of a Use Case, there can be cross-references to other Use cases – so we have a many-to-many relationship between Use Cases (just as we could between any type of document). It’s interesting to analyse this relationship, to see which Use Cases have lots of outgoing references, and which Use Cases are referenced by many other Use Cases.

This kind of totalling is Pivot Table territory. So the question arises of how to contruct the basic data about the cross-references – a table of A-references-B records – and how to get it into Excel.

A key advantage of XML-based documents is that you can import them into an XML database and run queries over them, where the queries are written in XQuery. Such queries can generate output as XML, HTML or plain text. For our purpose, a simple route is to output an HTML table. This can then be imported by Excel via >Get External Data >From Web. The URL for the import points to our XQuery file, within the database server:

http://roy-laptop:18080/qizx/xqs/ucrefs.xq

Here, the database is XMLMind’s Qizx, running in server mode (locally on my laptop, in this case). This URL can be used like any other in a browser interface, with the results of the query being displayed within the browser. In our case, we are going to give the URL to Excel.

Here’s the code of the query:

xquery version "1.0";
declare option output:method "html";

declare variable $ucs := collection("/UC");

declare function local:uc_refs()
{
  for $ucref in $ucs//uc-ref
  let $file := document-uri(root($ucref))
  let $ucreftext := $ucref/text()
  order by $file
  return
  <tr><td>{$file}</td><td>{$ucreftext}</td><td>1</td></tr>
};

let $ucrefs := local:uc_refs()
return
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <title>UC References</title>
  </head>
  <body>
    <table>
      <thead>
	    <tr><td>Source</td><td>Target</td><td>Token</td></tr>
      </thead>
      <tbody>
	    {$ucrefs}
      </tbody>
    </table>
  </body>
</html>

Variables start with $. The function collection (line 4) scopes the query to a particular directory tree of document files. The expression ‘$ucs//uc-ref’ binds to each occurrence of the <uc-ref> element, anywhere in the collection of documents. $file is the document in which a <uc-ref> is found. The third column of the table (Token) is just the number 1 – this gives me some data for the pivot table to sum.

Now we can do the import into Excel:

Finally, I can insert a pivot table based on this data, subtotalling by either Source or Target, or putting both dimensions on the same axis. I love it when stuff just works…

Out of curiosity, I ran the import with the macro recorder on, to see what the VBA looks like:

Sub Import()
    With ActiveSheet.QueryTables.Add( _
        Connection:= _
            "URL;http://roy-laptop:18080/qizx/xq/ucrefs.xq",
            Destination:=Range ("$A$1"))
        .Name = "ucrefs.xq"
        .FieldNames = True
        .RowNumbers = False
        .FillAdjacentFormulas = False
        .PreserveFormatting = True
        .RefreshOnFileOpen = False
        .BackgroundQuery = True
        .RefreshStyle = xlInsertDeleteCells
        .SavePassword = False
        .SaveData = True
        .AdjustColumnWidth = True
        .RefreshPeriod = 0
        .WebSelectionType = xlAllTables
        .WebFormatting = xlWebFormattingNone
        .WebPreFormattedTextToColumns = True
        .WebConsecutiveDelimitersAsOne = True
        .WebSingleBlockTextImport = False
        .WebDisableDateRecognition = False
        .WebDisableRedirections = False
        .Refresh BackgroundQuery:=False
    End With
End Sub

So it’s just creation of a QueryTable, with the URL given as a Connection string. The WebSelectionType property is ‘all tables’, although I thought I clicked the lower of the two yellow arrows in the Import dialog. Clearly, we could wrap this up a bit more: selecting from a list of queries, creating a new Worksheet, etc.

Having established the basic XML –> XQuery –> HTML Table –> Excel/VBA architecture, I think I’ll experiment more with this in due course.

Incidentally, this is my 100th post (Rah! Rah!), having taken almost exactly two years. And most enjoyable it’s been – at least for me :-)

About these ads

2 Responses to “Analysing Cross-references Between Documents”


  1. 1 Dick Kusleika January 28, 2011 at 3:15 pm

    Congrats on 100. It’s been enjoyable for us too!

  2. 2 Roy MacLean January 31, 2011 at 10:27 am

    Thanks, Dick – always a pleasure when you drop by.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s




January 2011
M T W T F S S
« Dec   Mar »
 12
3456789
10111213141516
17181920212223
24252627282930
31  

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: