### Excel Value Distribution

I recently wanted to take a long list of values and find the distribution of these. So if my data consists of integers between 1 and 9 (or could be converted into such), then I want to see the count of 1s, the count of 2s, …, the count of 9s. Obviously, this would lend itself to presentation as a column chart:

So how do we get the count values in column C? A quick skim through Walkenbach’s Formulas didn’t reveal the answer, though I might have missed it.

As you might guess, we can use a single-cell array formula. In C2, we want to handle the boolean array {B2=data}, where data is the name of the whole set of data values in column A.

Note, incidentally, that we can’t use named-range-intersection inside an array formula: {value=data}, where value names the column B values, does not work.

So we need to OR the boolean array {B2=data}. However, we now hit the problem that AND and OR always produce a single result value, even inside an array formula. The trick is to convert TRUE to 1 and FALSE to 0; then add for disjunction, multiply for conjunction. So the formula we have is:

{=SUM((B2=data)*1)}

(with the braces indicating array-formula entry – Ctrl-Shift-Enter). This then fills down column C. The type conversion is effected by: FALSE * 1 = 0, TRUE * 1 = 1.

A more general version would allow us to set a value interval for the counts:

The min and max are generated from the interval. In E2 (for example), the formula now has two comparisons:

{=SUM((data >=C2)*(data<=D2))}

Here, the pairwise multiplication of the two boolean arrays effects the type conversion. SUM then operates on a single array of 0s and 1s.

The need for this type conversion from booleans to integers is a bit nasty, and comes from the implementation of AND and OR, which flattens arrays into a single set of values. So:

{=AND({TRUE;FALSE},{TRUE;FALSE})}

returns a single FALSE value, not an array {TRUE;FALSE}.

It’s instructive to think about how you might do this in VBA: with a function that takes the data range and returns an array of count values. There are clearly two levels of iteration, the one over data, which is implicit in the array formula, and one over the values/intervals, which is represented by filling the formula down the column.

#### 6 Responses to “Excel Value Distribution”

1. December 24, 2009 at 3:33 pm

You could select your output range, and enter this array formula (so it spans the output range):

=FREQUENCY(data,value)

2. December 25, 2009 at 2:06 pm

Thanks, Jon.
I should have guessed there’d be a function for this.
Although the DIY form is open to fancier variations – perhaps a comparison function other than straight equality.

3. December 27, 2009 at 6:01 am

Actually, frequency is like an inequality, compiling counts of values between items in the second range; it doesn’t just count values equal to these items. I always have to look up how exactly the inequalities work.

4. 4 AlexJ January 1, 2010 at 2:25 pm

I kinda like the pivot table solution to this. As for intervals, pivot table grouping works as long as the intervals are regular.

5. 5 Bram September 30, 2010 at 11:20 am

If for some reason you still would go on using the approach described in the post rather than the frequency table, I personally would prefer a formula using sumproduct i.s.o. a array formula with sum.

Basically {=Sum(yourRange=condition)*1} you can use =sumproduct(1*(yourRange=condition)) as without the need for an array formula

6. September 30, 2010 at 11:42 am

Bram,
Yes, ‘=sumproduct(1*(yourRange=condition))’ does work, but it is still essentially the same array formula – you just don’t have to enter it as an array formula. Personally, I find the notion of SUMPRODUCT with only a single argument unclear (product of what?).