Following on from an earlier post, I thought that I’d quickly (!) try a different function. Not being a statistician, I wanted something with a simple formula, but that worked on potentially very large arrays: CORREL (correlation between two data sets) is something I can get my head around.
So I set up two columns of 1 million rows: column A containing 1,2,3,4,…; column B containing 2 * A * (1+ nnn), where nnn is a small, random, positive or negative quantity. For example, row 10 contains (10, 21.26344). The column ranges are named. A single cell contains a CORREL formula – this calculates in somewhat less than 1 second.
So I thought I’d try:
correlation = WorksheetFunction.Correl(arrX,arrY)
where arrX, arrY are 1-dimensional arrays obtained from the respective column ranges (whose Values are 2-D, N x 1 arrays). The above call is made between calls to Timer, and with all Application updating switched off. So far so good.
Times for 50000 rows are either 0.031 or 0.047 seconds – precisely those values, never anything in between; the mix seems to be random. For 65536 rows, the two alternative times are 0.047 and 0.063, again precisely, and randomly mixed. That’s pretty odd…
So, whack it up to the full 1 million rows, to get a decent elapsed time? Unfortunately not. In fact, 65537 rows causes a Run-time Error 13 – Type Mismatch on the call to Correl (as above). So something is limited to 16 bits, which is of course the storage for a VBA Integer. But my code does not use Integer anywhere. The actual data values are just Variants, indices are Longs. And the worksheet formula works fine for 1 million rows.
What’s going on? Any suggestions welcome. Given that I can’t get a benchmark for a very large number of rows, there doesn’t seem much point in writing a custom VBA Correl function to compare. Hmmmm.