Stringbuilder vs. Concatenation in .NET

Nothing very earth-shattering, but I’ve noticed that there are some misunderstandings related to the trade-offs of using the StringBuilder vs. concatenating strings, and thought I would pitch in my 3 cents.

First off, I will assume that you already know not to build massive pages of text (say an HTML page) by joining dozens or hundreds of strings together. I do think, however, that there should be a compiler feature that runs a couple of thousand volts through any programmer who does that.

This is more about the edge cases. First of all, how many strings should you be joining together before it makes sense to use a StringBuilder, vs. just straight concatenation? With two strings, it is easy – concatenate the strings! This is clear if you just count the number of objects:


string str3 = str1 + str2;

Vs.


StringBuilder sb = new StringBuilder();
sb.Append(str1);
sb.Append(str2);
string str3 = sb.ToString();

Forget anything going on behind the scenes – using the StringBuilder is adding an object that has to be cleaned up, so it is a net loss. When you consider the actual work being done, it is even more or a net loss – StringBuilder has to allocate and maintain a buffer.

Beyond two strings, though, things get a little bit more complicated. Just on object count, for instance, 3 strings technically end up a wash:


string str4 = str1 + str2 + str3;

In this code, you have added an object (str1 + str2), so you have the same number of objects as with the string builder code. At this point, you have to think about what is going on behind the scenes. With the StringBuilder, a buffer is set aside (16 bytes, if you don’t pass a larger value). When each string is added, if the buffer is large enough, the string is copied into the buffer (which is really fast). If the buffer is not large enough, a bigger buffer is allocated, then the data is copied over. This is repeated for each string.

For the concatenation, a new buffer is allocated based on the size of the first two strings, then the data is copied into that buffer, and that buffer represents the new string. Next, a new buffer is created the size of the combined strings 1 & 2 and the string 3, and the data is copied over.

So which is more efficient? Depends on the strings – the StringBuilder is a tad more efficient if the original buffer was big enough. Otherwise, the concatenate is either as efficient or slightly more efficient, depending on whether adding the strings exceed the length of the buffer each time.

One thing from this, at least, should be obvious:


StringBuilder sb = new StringBuilder(str1.Length + str2.Length + str3.Length);
sb.Append(str1);
sb.Append(str2);
sb.Append(str3);
string str4 = sb.ToString();

Is definitely more efficient than:


StringBuilder sb = new StringBuilder();
sb.Append(str1);
sb.Append(str2);
sb.Append(str3);
string str4 = sb.ToString();

I have also seen people just pass an arbitrary size to the StringBuilder that will probably be big enough:


StringBuilder sb = new StringBuilder(1024);

This bugs me, but so long as the number is not way oversized, it is probably okay.

So, with 3 strings, if you get the size ahead of time, the StringBuilder is probably a little bit more efficient, but whether it is worth the extra code depends on usage (i.e. whether you are in a tight-loop vs. an occasionally called method). The other thing to think about is performance.

Performance

It may surprise you, but string concatenation tends to be faster than using a StringBuilder! That is because of Garbage Collection. StringBuilder does all sorts of nice, clean, buffer management. Concatenation just drags out a new hunk of memory and uses it. Assuming that you have an infinite amount of memory, this is quicker, because there is no management work to do.

Of course, most of us don’t have an infinite amount of memory, so eventually this catches up with us, and the garbage collector has to step in and clean up all of those abandoned strings. It is kind of like memory-management via credit-card – you don’t have to pay the bill immediately, but you do have to pay the bill (plus interest) in the future.

That said, there are reasons to use credit cards, and there are reasons to not worry too much about this all the time. After all, the garbage collector runs in another thread, with a lower priority, so it has less of an impact.

Rules of thumb

So, after all this, what are some good ways of deciding when to concatenate vs. use a StringBuilder? Here are my rules of thumb, in order of precedence:

1. Two strings: I always concatenate
2. Otherwise, if the code is in any sort of tight loop, or is likely to be called a lot, I generally use a StringBuilder.
3. Otherwise, for more than 5 strings, or for fewer strings that I expect to be large, I generally use a string builder.
4. Otherwise, for 4 or fewer strings, I generally concatenate.

Also, if it is easy to calculate the size, I do so. If it is not (i.e. if I am getting a lot of values from methods, or passing around the StringBuilder) I just leave the default size. I trust that the default buffer handling will do a better job than my random guesses.

How much does this really matter?

Most of the time, not a whole lot! The days when we had to worry about every byte and every allocation are mostly behind us, although there are obviously cases (high throughput, major text-handling, etc.) where it matters a great deal!

However, it makes me feel better to have a consistent plan that will scale no matter what, and this way I don’t have to worry about it too much.

If you do have a situation where it is potentially an issue, you can always write some test code – but don’t forget that the real cost is a little bit hidden because of the garbage collector.

Technorati Tags:

Trackback URL for this post:

http://www.exotribe.com/trackback/27