Performance – Functions (addendum)

Following on from an earlier post, I thought that I’d quickly (!) try a different function. Not being a statistician, I wanted something with a simple formula, but that worked on potentially very large arrays: CORREL (correlation between two data sets) is something I can get my head around.

So I set up two columns of 1 million rows: column A containing 1,2,3,4,…; column B containing 2 * A * (1+ nnn), where nnn is a small, random, positive or negative quantity. For example, row 10 contains (10, 21.26344). The column ranges are named. A single cell contains a CORREL formula – this calculates in somewhat less than 1 second.

So I thought I’d try:

correlation = WorksheetFunction.Correl(arrX,arrY)

where arrX, arrY are 1-dimensional arrays obtained from the respective column ranges (whose Values are 2-D, N x 1 arrays). The above call is made between calls to Timer, and with all Application updating switched off. So far so good.

Times for 50000 rows are either 0.031 or 0.047 seconds – precisely those values, never anything in between; the mix seems to be random. For 65536 rows, the two alternative times are 0.047 and 0.063, again precisely, and randomly mixed. That’s pretty odd…

So, whack it up to the full 1 million rows, to get a decent elapsed time? Unfortunately not. In fact, 65537 rows causes a Run-time Error 13 – Type Mismatch on the call to Correl (as above). So something is limited to 16 bits, which is of course the storage for a VBA Integer. But my code does not use Integer anywhere. The actual data values are just Variants, indices are Longs. And the worksheet formula works fine for 1 million rows.

What’s going on? Any suggestions welcome. Given that I can’t get a benchmark for a very large number of rows, there doesn’t seem much point in writing a custom VBA Correl function to compare. Hmmmm.


4 Responses to “Performance – Functions (addendum)”

  1. 1 John Walkenbach January 22, 2010 at 4:51 pm

    I think the problem is the TRANSPOSE function (which I assume you’re using when you transfer the range to the variant array). TRANSPOSE can’t handle more than 65,536 elements in a dimension. If you transpose a range that has more than 65,536 rows, the array isn’t truncated at 65,536. Rather, you end up with a smaller array than you think (and no error message).

    After transposing a single column of cells, the actual array length is determined by:

    NumRows Mod 65536

    So when you start with a range that has 65,537 rows, the array has only one element after using TRANSPOSE (use UBound to check this). The CORREL function needs at least two elements in each array, so that’s the source of the error. Assigning 65,538 elements won’t cause an error in CORREL, because then the arrays have two elements each.

    You didn’t get an error when you tried 1,000,000 elements because the arrays actually had 16,960 elements each.

    At least I *think* that’s what’s going on.

  2. 2 dougaj4 January 23, 2010 at 11:58 pm

    Roy – it seems the maximum size of array that can be passed by a VBA UDF in XL 2007 is 65536 elements (the same would apply to earlier versions of course). I don’t know about 2010 (I’ve downloaded the beta, but not installed it).

    The problem is in transferring the array back to the spreadsheet. If you put a breakpoint in the UDF after it has completed the calculations but before it assigns the array to the return value you should find that there is a perfectly valid array of results in VBA, just waiting to sent back to the spreadsheet.

    For testing purposes you could return the time taken for the UDF to do its stuff, rather than the results of the calculations, that works on 1 million rows.

  3. 3 dougaj4 January 24, 2010 at 12:07 am

    Correction – I was stepping through an array one row at a time, then assigning the resulting array to the UDF return value.

    If the workshhetfunction operates on the whole array in one go (as Correl does) then it looks like you are stuck with the 65536 row limit.

  4. 4 Roy MacLean January 25, 2010 at 9:56 am

    The TRANSPOSE and CORREL benchmarks are separate. The TRANSPOSE one only went up to a 3000 x 3000 array, so nowhere near the 64K limit for a single dimension.

    For the CORREL benchmark, you seem to be wrong about the mod business: I get the Type Mismatch error for 65537, 65538, 65539, 65540, … For size = 1000000, I get a different error: #9 Subscript out of range.

    For size = 65540, say, if I stop execution just before the call to Correl, the two VBA arrays obtained from the worksheet columns are indeed the correct size, with correct data. Then, presumably, these two arrays are passed back to the Excel worksheet function. So why can an array of this size come from a Range to a VBA array, but not from a VBA array to an Excel function? And why “Type Mismatch” if it’s an indexing issue? I get the feeling that this is a bit dodgy…

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

January 2010
« Dec   Feb »

%d bloggers like this: