Archive for April, 2010

Worksheet Hierarchies – part 2

In Part 1, I was toying with the idea of worksheets that are hierarchically related. In this example, the hierarchy reflects geographical areas: Country, Area, City. Data is held by the leaf-node worksheets, and aggregated by the higher-level worksheets. Using the INDIRECT function means that the aggregation formulas are generic, and so can be used as-is from a template worksheet.

This is clearly pivot table territory, so what’s the difference with this approach?

Firstly, the scenario I have in mind is where you are collating worksheets sent to you in separate workbooks (for example, the London office sends you London.xlsx, containing a single London worksheet). You have already developed a bit of VBA to copy each of the worksheets into a single collated workbook (UK.xlsx, say). However, the incoming worksheets are effectively reports, and might contain information in addition to the raw Sales and Costs figures. So I can’t immediately collate the Sales/Costs information into a single table: if I need to, that’s a subsequent step (see later).

Secondly, each leaf-node worksheet only needs to know about its parent node. For example, London knows it’s in the South area, but not about its place in the overall hierarchy. Similarly, a non-leaf node knows only about its parent and its children. This makes it easy to change the hierarchy – for example, breaking London into Central London and Outer London.

In contrast, if the basic data were in table form, the levels of the hierarchy would need to be explicit, as columns:

If I decided to break London into Central and Outer zones, then I’d need an additional  ‘Zone’ column, with values (a lot ‘n/a’) for each record.

However, the tabular data is clearly useful for generating pivot table summaries – here’s one at the Area level:

or with multiple fields on the Row axis, to get a hierarchical summary:

So, I’m thinking that it would be useful have some VBA to do the second stage of collation, of the leaf-node Sales/Costs data into a single table. This sounds like it will involve a tree walk around the worksheet hierarchy. Perhaps in two passes: once to find out the maximum depth (and thus the columns of the table), and again to copy the data into the table.

Worksheet Hierarchies

Back from the Easter break now…

It occurred to me that a common requirement is for a workbook to reflect a hierarchical structure – for example:

  • divisions/units within an organization
  • geographical areas (country, region, city)
  • product/part breakdown
  • reporting period (year, quarter, month).

An obvious approach is to have a worksheet for each node in the hierarchy, with the leaf nodes holding the data, and the non-leaf nodes aggregating the data using formulas. (Note that I’m only considering single-dimension aggregation here, not hypercubes).

So I thought it would be useful to have a template worksheet that can be copied for each node in a hierarchy, and linked in to the hierarchy using the worksheet names. The aggregation formulas are already in place, and pick up the names of child nodes. And to aid navigation, we can have some hyperlinks to root, parent and child nodes.

My example is the start of a geographical hierarchy:

UK is the root; London and Brighton are in the South region. Each node worksheet has been copied from Template. Here’s the UK sheet:

The child nodes (i.e. the regions) are represented in a table, with the names entered in the Region column. The Sales and Costs columns pick up these names, to get the Sales/Costs values from the child worksheets. So, for example, cells E9 and F9 contain the formulas:

=INDIRECT($D9 & “!Sales”)

=INDIRECT($D9 & “!Costs”)

The UK node’s owns Sales and Costs values are just totals of the relevant columns:



The Sales/Costs cells are named ranges, needless to say.

A leaf node worksheet, such as London, has an empty child table, and literal values for Sales and Costs:

I’ve left the formulas in the table in case this node gets decomposed further, but it is in practice empty.

Here’s an intermediate node, for South region:

There are links to the root node (UK), the parent node (also UK here) and the child nodes. Each of these is a HYPERLINK formula that picks up the relevant node (i.e. worksheet ) name. For example, C2 contains the formula:

=HYPERLINK(“[” & Root & “.xlsm”]” & B2 & “!$A$1”, “link”)

Note that HYPERLINK needs a filename, even if the link is internal to this workbook. By convention, the workbook has the same name as the root node. So ‘Root’ is just a range name for B1 on the UK worksheet. The links all go to A1 on the target sheet. Similarly, G9 contains:

=HYPERLINK(“[” & Root & “.xlsm”]” & D9 & “!$A$1”, “link”)

So, to create a hierarchy node, you need to:

  • copy the Template worksheet
  • set Parent, Name, and Children (if any)
  • set Sales and Costs values for leaf nodes.

No VBA so far, but I think it would be useful to build:

  • a hyperlinked Table of Contents just to the leaf nodes – that is, where the data entry happens
  • hierarchical summaries (i.e. each one on a single worksheet), for individual data categories (e.g. Sales).

More in due course.

April 2010
« Mar   May »