### ‘Database’ Functions in Excel

In Excel there is a category of functions called “Database”. This has always been a confusing term, as now (in 2007+) they relate to Tables (a.k.a. ListObjects, in VBA), or equivalent Ranges. The function names all start with ‘D’ (for database): DCOUNT, DSUM, etc. These are equivalents of the standard aggregation functions COUNT, SUM, etc (by aggregation, I mean a function that takes a set of values, and returns a single value).

What these D-functions allow is selective aggregation of data from a table, given a set of criteria – in other words, the combination of an advanced filter with an aggregation, without the need for the filtered data to reside on a worksheet.

Here’s an example:

The simple case is where the criteria are directly related to the data in the table. So, to count Bob’s records, the criteria range is B2:B3, and the DCOUNT formula is in B5:

The reference to “SalesTable[[#Headers],[Contacts]]” is just because I clicked on D9 – I could have just put “Contacts” (but I hate typing :-)). Note that the blank headers in row 2 are required as part of the criteria range (row 1 is just labels). Unsurpisingly, there are 12 records – one for each month of the year.

However, this kind of subtotalling by ‘dimension member’ (to use the OLAP term) is what pivot tables do. More interesting is when the criteria involve a Boolean-returning formula applied to each record.

For example, suppose that we want to know how many reps had more than average number of contacts (yes, I know it will be about half). So in D7 we have:

=AVERAGE(SalesTable[Contacts])

and in D3:

=D10>\$D\$7

The criterion is a formula referring to the first data record, which returns a Boolean. I think of this formula as being filled down through the table records, as if in an additional column (note the absolute/relative addresses).

Even more interesting is when the criteria involve functions on multiple fields in a record. For example, suppose that we are interested in records where Contacts2 is greater than Contacts - these are the guys who are improving.

So in E3, we have:

=E10>D10

(again, referring relatively to the first record).

This is fine, but more generally, our Boolean function operates on the entire record. So in F3, we have:

=TestRecord(A10:E10)

where TestRecord is a pure VBA function:

```Public Function TestRecord(rec As Range) As Boolean

TestRecord = rec.Item(4) < rec.Item(5)

End Function
```

Clearly, the body of this function can be as complicated as we wish, using the cells in rec. However, it depends on a particular ordering of the columns within the table. It is possible that we would want to use this function on tables that have the Contacts columns in different positions. So, an improved version intersects the named columns with the given rec.

```Public Function TestRecord2(rec As Range) As Boolean

Dim table As ListObject
Set table = rec.ListObject

Dim arg1 As Range
Set arg1 = Intersect(table.ListColumns("Contacts").Range, _
rec)

Dim arg2 As Range
Set arg2 = Intersect(table.ListColumns("Contacts2").Range, _
rec)

TestRecord2 = arg1.Value &lt; arg2.Value
End Function
```

Since the supplied Range is the first record in the table, we could simply pass the Table name to the function and derive from that the Range for the first record (but it’s getting late…).

Constraints on the Database functions are:

• the criteria have to be Ranges (and thus on a Worksheet), not in-code arrays
• they have to be vertically-oriented, contiguous Ranges (so can not be filled down).

A somewhat more esoteric limitation is that you can not plug a custom aggregation function into the basic D-function mechanism – DWEIRDSUM, perhaps.