Excel Value Distribution

I recently wanted to take a long list of values and find the distribution of these. So if my data consists of integers between 1 and 9 (or could be converted into such), then I want to see the count of 1s, the count of 2s, …, the count of 9s. Obviously, this would lend itself to presentation as a column chart:

So how do we get the count values in column C? A quick skim through Walkenbach’s Formulas didn’t reveal the answer, though I might have missed it.

As you might guess, we can use a single-cell array formula. In C2, we want to handle the boolean array {B2=data}, where data is the name of the whole set of data values in column A.

Note, incidentally, that we can’t use named-range-intersection inside an array formula: {value=data}, where value names the column B values, does not work.

So we need to OR the boolean array {B2=data}. However, we now hit the problem that AND and OR always produce a single result value, even inside an array formula. The trick is to convert TRUE to 1 and FALSE to 0; then add for disjunction, multiply for conjunction. So the formula we have is:

{=SUM((B2=data)*1)}

(with the braces indicating array-formula entry – Ctrl-Shift-Enter). This then fills down column C. The type conversion is effected by: FALSE * 1 = 0, TRUE * 1 = 1.

A more general version would allow us to set a value interval for the counts:

The min and max are generated from the interval. In E2 (for example), the formula now has two comparisons:

{=SUM((data >=C2)*(data<=D2))}

Here, the pairwise multiplication of the two boolean arrays effects the type conversion. SUM then operates on a single array of 0s and 1s.

The need for this type conversion from booleans to integers is a bit nasty, and comes from the implementation of AND and OR, which flattens arrays into a single set of values. So:

{=AND({TRUE;FALSE},{TRUE;FALSE})}

returns a single FALSE value, not an array {TRUE;FALSE}.

It’s instructive to think about how you might do this in VBA: with a function that takes the data range and returns an array of count values. There are clearly two levels of iteration, the one over data, which is implicit in the array formula, and one over the values/intervals, which is represented by filling the formula down the column.

Advertisements

6 Responses to “Excel Value Distribution”


  1. 1 Jon Peltier December 24, 2009 at 3:33 pm

    You could select your output range, and enter this array formula (so it spans the output range):

    =FREQUENCY(data,value)

  2. 2 Roy MacLean December 25, 2009 at 2:06 pm

    Thanks, Jon.
    I should have guessed there’d be a function for this.
    Although the DIY form is open to fancier variations – perhaps a comparison function other than straight equality.

  3. 3 Jon Peltier December 27, 2009 at 6:01 am

    Actually, frequency is like an inequality, compiling counts of values between items in the second range; it doesn’t just count values equal to these items. I always have to look up how exactly the inequalities work.

  4. 4 AlexJ January 1, 2010 at 2:25 pm

    I kinda like the pivot table solution to this. As for intervals, pivot table grouping works as long as the intervals are regular.

  5. 5 Bram September 30, 2010 at 11:20 am

    If for some reason you still would go on using the approach described in the post rather than the frequency table, I personally would prefer a formula using sumproduct i.s.o. a array formula with sum.

    Basically {=Sum(yourRange=condition)*1} you can use =sumproduct(1*(yourRange=condition)) as without the need for an array formula

  6. 6 Roy MacLean September 30, 2010 at 11:42 am

    Bram,
    Yes, ‘=sumproduct(1*(yourRange=condition))’ does work, but it is still essentially the same array formula – you just don’t have to enter it as an array formula. Personally, I find the notion of SUMPRODUCT with only a single argument unclear (product of what?).


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s




December 2009
M T W T F S S
« Nov   Jan »
 123456
78910111213
14151617181920
21222324252627
28293031  

%d bloggers like this: