Performance – Strings

In the previous post, we looked at the performance hit on accessing worksheet ranges from VBA. As is well known, it is advisable to minimize accesses by reading/writing arrays.

Another area where warnings abound is in operations on large strings. A particular case is building up long strings by concatenation. I expect we’ve all done something like:

mystring = mystring + newpiece

The overhead here seems to be because the old mystring is copied to a new mystring, and then the newpiece appended. The copying time is presumably proportional to Len(mystring).

For ad hoc string building (a range address, or name, or an SQL string in Access) this is not significant. Where problems could occur is when the concatenation is being done inside a high-iteration loop – that is, building up a very large string out of very many pieces.

Since mystring gets longer and longer, each concatenation gets slower and slower. The total time is therefore going to be proportional to n-squared, where n is the number of concatenations (i.e. the number of pieces). This is obviously bad news.

As a simple benchmark, I concatenated a single character string 100,000 times (on my fairly basic laptop), and it took 3.4 seconds; 200,000 times took 12.9 seconds. That’s a factor of 3.8, so roughly 4.

Now, that’s quite a big string to be building – not something you do every day, I’d guess. But you might be building up a string in memory in order to avoid large numbers of file-write operations, which are presumably going to be much slower than one write of a large string.

There are two alternatives. One is is to use the Mid function as a kind of ‘deconstructor’ function on the left-hand side of an assignment (this is a bit odd). This essentially ‘patches’ a substring on top of a base string. The trick here is that you start with an all-spaces base string of the anticipated size of your output string; this can be constructed using the (fast) Space function.

mystring = Space(size)
i = 1
Do While i < size
    substr = "xxx"   'or return from a function
    lensubstr = Len(substr)
    Mid(mystring,i,lensubstr) = substr
    i = i + lensubstr
Loop

Some people advocate using the ‘$’ versions of the string functions – so, Mid$, in this case. These are typed to take and return Strings, rather than Variants, and so are supposed to be more efficient. This does not seem to be the case with my string-building benchmark: the times are the same.

For a 100,000 character string, built one character at a time, it takes 0.05 seconds; for a 200,000 character string, 0.09 seconds. This looks like it might be linear, as you would expect.

Another alternative is to build an array of substrings, and then use the Join function to do a batch-concatenation:

mystring = Join(myarray)

This is about half as fast as the Mid solution (0.09, 0.17 seconds), but is still linear. The advantage of this is that it’s easier to retrieve/modify the substrings prior to the final string-build (with the Mid solution you’d need to keep an array of substring offsets).

Being good (!) object-oriented programmers, we should really encapsulate our chosen mechanism in a StringBuilder class (I expect that there are some out there). If the strings are being written to a file, then our class could handle this as well: a kind of wrapper around a TextStream object.

Another, less significant, optimization is checking for empty strings. Most people would compare with a literal empty string:

If mystring = “”

Since there is supposed to be an overhead in doing this (many times in a loop), alternatives are:

If Len(mystring) = 0

If mystring = vbNullString

The times for 10 million tests were 0.73, 0.36, 0.72 seconds. So the Len function is twice as fast, but it doesn’t really seem something to lose sleep over.

Finally, I’ve seen recommendations that when searching for a substring inside a larger string, you should use the InStr function, rather than Like “*foo*”. Times for 10 million tests were 1.56 and 2.55 seconds. So not hugely significant, and Like is obviously more flexible.

Advertisements

8 Responses to “Performance – Strings”


  1. 1 Roy MacLean January 14, 2010 at 4:35 pm

    A quick Google reveals that there’s a VB.NET StringBuilder class. For VBA, the only one I found is on this page:
    http://stackoverflow.com/questions/1070863/hidden-features-of-vba
    (this one is Mid-based).

    I think I might incline to an Array-based version – it just seems slightly higher-level. For example, it could offer operations to UnAppend, or switch two adjacent substrings. You could have a 2-dimensional array of substrings within lines. This would make TextStream.Writeline easy……

  2. 2 Dave January 14, 2010 at 4:51 pm

    I wrote one a while ago and was looking for a reference on the rationale for using Mid and lo! I found this page. Good writeup.

    Microsoft has an example and some test cases showing some timing comparisons at http://support.microsoft.com/kb/170964

    If you’re interested, my StringBulder is here: http://sites.google.com/site/daveparillo/software-development/vba/stringbuilder

  3. 3 Roy MacLean January 14, 2010 at 5:51 pm

    Thanks for the links, Dave.

  4. 5 Dick Kusleika January 14, 2010 at 8:43 pm

    I’m surprised fixed-length strings aren’t fast, but his resulted in 2244 vs 5 for you mid solution

    Sub test2()

    Dim mystring As String * 65526
    Dim clsTimer As CTimer
    Dim i As Long

    Set clsTimer = New CTimer
    clsTimer.StartCounter

    For i = 1 To 65535 Step 3
    mystring = mystring & “xxx”
    Next i

    Debug.Print clsTimer.TimeElapsed

    End Sub

    And of course only good up to 65526 characters.

  5. 6 Roy MacLean January 15, 2010 at 10:57 am

    @JP:
    “CopyMemory – A VB Hacker’s Dream” – says it all, really 🙂

  6. 7 Dave January 20, 2010 at 9:14 pm

    A quick google search turned up several different flavors of array-based versions, but most of them chunk each “append” call as a separate substring. IMHO this completely defeats the purpose of taking all the effort of managing the arrays in the first place.

    One notable example is the code written by Francesco Balena. You can find it here: http://vbcode.com/Asp/showsn.asp?theID=415 – there’s a few copies floating about is you google his name and ‘cstring’. It’s good, but still not as fast as the much simpler version you originally found on stack overflow.

    It uses byte arrays and includes CString variants of several common string functions.

  7. 8 Roy MacLean January 21, 2010 at 9:07 am

    Dave,
    Not sure what you mean by your IMHO comment. If the array-element substrings are something meaningful (lines, records, fields) then I might well want to keep them discrete until the last possible moment (i.e. the Join operation). It’s also possible that I might want to perform specific validity checks over the substrings – e.g. format, ordering. So this would be a custom string-builder, rather than a general substitute for the built-in String type, which is what Balena’s class is (looks impressive).

    /R


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s




January 2010
M T W T F S S
« Dec   Feb »
 123
45678910
11121314151617
18192021222324
25262728293031

%d bloggers like this: