Concatentation - the & and + operators
Concatenation is very slow, because it requires reallocation.
[Todo: outline the steps involved. I could "steal" back
the 6-step list in an MSKB article on this topic, since they ripped off one of
my posts to the fastcode newsgroup...]
The time taken to concatenate N strings of the same length, one
at a time, will in general be proportional to N*N.
In point of fact timings are even worse than this for very long
strings. As the overall string length climbs above about 200K,
some or all of the memory used for the string may be paged out by the
Operating System (this appears to happen about one time in ten or twelve,
and results in 60% slower performance, roughly - no precise figures yet!).
In VB (but not in VBScript) it is possible to implement faster
string concatenation using additional variables and steps, or
by replacing strings with a class that makes concatenation more
efficient.
[The classes I've posted to the fastcode and other
VB component newsgroups weren't the full quid; I only showed how to optimize
concatenation for performance, not how to speed up other string operations (and,
in particular, how to write *out* long strings to an ASP Response without undue
performance overhead... Here's a transcript of a post I made to the aspxml
newsgroup back in July 2000... What I didn't know when I wrote it was that there's
probably a dirty trick you can use to Response.Write long strings very rapidly. You
loop an index through them sending out about 1000 characters at a time, as shown here...
Haven't tried this yet, but it'll work.]
Public Sub WriteBufferToResponse(byref strBuffer AS String, byval lngLength AS Long _
, byref Response As ASP.Response)
'Imagine strBuffer is a buffer (any allocation length), and
'lngLength is how many chars are currently in use
Dim strWindow AS String
Dim lngIndex AS Long
strWindow = space(1000)
for lngIndex=1 to lngLength-999 Step 1000
Mid(strWindow,1,1000)=mid(strBuffer,lngIndex,1000)
Response.Write strWindow
Next
if lngIndex<=lngLength then
Response.Write mid(strBuffer,lngIndex,lngLength-lngIndex+1)
end if
End Sub
But there are things you can do to make ASP string performance a little better,
without making calls to COM components.
One option is to add the string fragments to the end of a dynamically allocated
array, and then iterate through the array, writing out the individual elements
(you should not use Join to short-cut this, as Join does not perform well
for arrays containing many strings.
See below).
If you don't like that idea (nor do I!), another way to speed up
string concatenation
of large numbers of short strings is to use intermediate variables. For example,
if you are writing out the HTML for an entire table, have separate strings for
the current table cell, the current row, and the table thus far. This will help.
If you possibly can, though, you are better off issuing individual Response.Write
calls to write out the short strings.
Some people recommend first concatenating all the strings on the right hand side of
an assignment bar the first one, like so:
strLong = strLong & (strShort & strShorter & strShorterYet)
While this will improve performance it does not aid readability.
Assigning into an intermediate variable would perform almost as well, and the
meaning will be a lot clearer:
'
'draft code to calculate a HTML string for a table
'
'strTable -- string for the table as a whole
'strRow -- string for the row under construction
'strCell -- string for the current cell
'
strTable = "<TABLE>"
'...for each row
strRow = " <TR>" & vbCrlf
'...for each cell
strCell = & " <TD>" & strCell & "</TD>" & vbCrlf
strRow = strRow & strCell
'...after all cells in current row
strRow = strRow " </TR>" & vbCrlf
strTable = strTable & strRow
'...after all rows
strTable = strTable & "</TABLE>" & vbCrlf
See also, Format Conversions below.
Which operator? Should you use & or +
If you know in advance that the left and right hand sides of a concatenation are
both going to be strings (before you ask to concatenate them), you can use + instead
of &. There is a performance advantage.
The Join() function
Both VB (version 6 and up) and VBScript provide a Join()
function. Join takes an array (of Strings or of Variants) and a separator as parameters,
concatenates all of the strings in the array, and adds a copy of
the separator between each pair of strings. For example,
Join(Array(1,2,"rabbit"),"--")
returns
"1--2--rabbit".
Join is faster than straight concatenation,
but does not scale well when called to concatenate arrays with a large number of
elements.
The execution time for Join() also increases with the square of
the number of strings being concatenated. Note that, although
many people will tell you that the average lengths of the strings
do not factor into the overall execution time for Join, this is not so.
It *is* important (but, I grant, less so than the number of strings).
The n-squared execution time of the function is due to the fact that
Join(), although it makes more efficient (and more direct) calls
to the BSTR functions provided by the operating system, does not
determine how long the resulting string will need to be before it
builds it, so it must reallocate over and over again as it
concatenates the individual strings from the array to build the result.
In VB (but not in VBScript) it is possible to write
implementations of Join() which scale better (even with
the overhead of going through extra layers of API calls to do
individual string operations).
[Todo: Insert link to LinearJoin() for VB
and NLogNJoin() for VBScript, once I have performance stats and a graph of same]
[Unfortunately the LinearJoin implementation I already
have doesn't scale down well. The standard Join() works better for
relatively short strings. To beat Join() even for short strings I would need
to implement in C++ (or possibly in assembler! Yuck!)]
[In the meantime, hint: if you looked at every string
before starting to build the result you would know exactly how long the result
is going to be. Then you could use the Mid statement instead of the &
or + operators...]
The Len() function
Determining the length of a string is very fast. Because the length of a string
is stored in the BSTR structure itself, it is not necessary to look at the content
of a string to determine how long it is. If the length of a string is changing,
it is not worth your while to keep track of it with a string length variable.
In VBScript and ASP it usually isn't worth assigning the length of a string into
a variable at all, even if the length of the string isn't changing, as the
time it takes for the script engine to make the assignment will probably outweigh
the cost of asking for the length!
The UCase() and LCase() functions
The Ucase() function takes time proportional to the length of the string that is
being converted (as does the LCase() function). You might think that means you
shouldn't use it, and that, if you need case-insensitive comparisons, strComp()
is better, but that might not be so.
If you are using the same string a lot of times (in case-insensitive operations)
it might well be worth converting it to upper or lower case once, and then doing
case sensitive comparisons. Indeed, Ucase() and LCase() could just about be
considered secret weapons, as we shall see when we come to the Instr(), InstrRev()
and Replace() functions.
The Instr() and InstrRev() Search functions
When used in case-sensitive mode Instr is quick (this is the default; if you are
calling Instr with two or three parameters, this is what you are using).
When used in case-insensitive mode it is slow. About 50 times slower than
the case-sensitive version, for long strings [Well, it
was in VB5, but I haven't re-run the performance analysis for VB6, or for ASP
and VBScript, but I'd be surprised if that has changed much. At best it will
be about 10 times slower, I expect]
This means that if you are searching the same string many times, it pays to
convert it to upper or lower case with UCase() or LCase(). If you need to keep
a cased version of the string, consider having two copies of the string,
one which has been converted to uniform case, and one which has not (the first
is used when you are "asking questions" about the string content,
the second whenever you wish to "do something to it"). AS we
shall see in the discussion of the Replace() function, this apparently daft
approach can pay major dividends.
The Replace() function
The Replace() function works very well when there aren't too many copies of the
string to be replaced in the target string. Indeed, it is probably one of the
best-written of the standard string operations built into VB and VBScript. However,
like Instr() and InstrRev() its performance is nowhere near as good when it is
run in case-insensitive mode. When you are replacing the same substring hundreds or
thousands of times, Replace() pays such a steep price for case insensitive
comparison that it is worth your while to replace it altogether (pardon the pun!).
I have never done performance analysis on the case-sensitive
version of Replace() to see how well it scales for replacing the same string hundreds
or thousands of times. I suspect its execution time will, in the end, increase with the
square of the length of the string. But I'm not sure.
[Todo: insert a link to my own implementation of
Replace for VB4/5 - it's 1998 vintage but still does the trick - which is faster
for case-insensitive Replace (well, it is when the search pattern is > about 5
chars and the searched string > about 200), and will outperform the case-sensitive
version of Replace when the searched string length (or the number of replacements)
gets large enough... but before I do this I'll want detailed performance info,
showing when it becomes faster]
The Split() function
Split() is your friend. It is very fast (except when used in case-insensitive mode).
If, for example, you wish to carry out an operation on each line of a file, a fast
way to do it is to load the entire content of a file into a string, and then
use Split() to break it into an array of lines. You can then carry out your
operation (whatever it is), even using the Mid(), Left() and Right() functions to
concatenate substrings if you like, without paying too much of a performance cost.
If you are writing back to a file, you might as well (unless it's megabytes in
length!) call Join() to stick the lines back together when you are finished processing
them.
The Mid(), Left() and Right() functions
Each of these functions creates a new string. This means that their execution
time is approximately proportional to the length of their result. The length
of the string they are extracting from appears to be immaterial.
[Todo: back this up with performance statistics]
However, you should be wary of using them at all, if you can find a way to avoid it.
A common method, for example, of parsing command-line parameters is to start with
a string containing the entire command line, and to snap off bits of it like this:
strNextBit = left(strCommandLine, lngCutPoint)
strCommandLine = mid(strCommandLine, lngCutPoint + 1)
However, this wastes considerable time, as both operations require new strings.
When you are parsing a long string, the second statement, in particular, becomes
expensive when you are executing it over and over again. Indeed, you're paying
the same performance price that you pay when you concatenate large numbers of
small strings to build a big one. Carving a string up like this is just like
concatenation, only in reverse.
Instead, you should look at ways to parse strings using the Mid() function, and
keeping track of where you are up to. If you are, for example, searching through
a file of several hundred kilobytes, this latter approach will work far better.
Not just a few times faster, but hundreds of times faster, in many cases.
[Unfortunately the examples I have available to illustrate
this point are rather complex. For example, an ASP->VB-Component translator!
Once I have a nice simple example I'll put in a link... I suppose I could put in
a command-line splitter, that's only about 70 lines of code...]
The Space() and String() functions
The Space() and String() functions are both very fast. If you need a string
of a given length and you don't much care what's in it (say you're checking how
fast you can read back a string from a COM component), Space() is the function
to call. It seems a pity that there isn't a function, say, StringRepeat(), to
return a given multiple-character string repeat an arbitrary number of times.
But you can get this! Instead of StringRepeat(pattern,count), which won't
work because there's no such animal, you can use
Replace(Space(count)," ",pattern). Although Replace() won't scale up
well for a really large number of repeats, doing it this way will be a
lot faster than concatenating it yourself. If you define a StringRepeat() function
of your own it will even be more readable.
The Mid assignment statement
If VB didn't have this it's string operations would be unfixable (without API
calls, anyway!). Since VBScript doesn't have it, you can't exploit it
to solve most of your string performance problems the way you can in VB. The
advantage of using the Mid assignment statement is that it does not change
the length of the string that you use it on. That means that the execution
time (for large strings) is roughly proportional to the number of characters
copied rather than the number of characters in the resulting string.
It turns out that, by exploiting this, you can build classes and routines that
have much better performance than you might expect.
The basic idea is to allocate more string length than you'll need, using the Space()
function, and keeping track of how much you have actually used. For example,
consider the following example implementation of URLDecode (
which I haven't fully tested yet!):
Function URLDecode(ByVal DecodeMe As String) As String
Dim result As String 'The output string (may be shorter than the input)
Dim i As Long 'Index for next character in the input string
Dim NextChar As String 'Next character to go to output string (possibly converted)
Dim outputLoc As Long 'Output position in constructed HTML encoded string
outputLoc = 1
result = Space(Len(DecodeMe))
On Error GoTo BadHexCode
For i = 1 To Len(DecodeMe)
NextChar = Mid(DecodeMe, i, 1)
If NextChar = "%" Then
NextChar = Chr(Val("&H" & Mid(DecodeMe, i + 1, 2))) 'hex to character
i = i + 2
ElseIf Mid(DecodeMe, i, 1) = "+" Then
NextChar = " "
End If
Mid(result, outputLoc, 1) = NextChar
outputLoc = outputLoc + 1
Next
URLDecode = Left(result, outputLoc - 1)
Exit Function
BadHexCode:
Err.Raise 5, "URLDecode", "Could not interpret % sequence starting at character " & i _
& " of the URL encoded string " & Chr(34) & DecodeMe & Chr(34)
End Function
Okay, so it's not as readable as a naive version of URLDecode (by a long shot!), but
I assure you it performs much better than this (particularly if you call it to translate
very long strings [Todo: include performance stats showing just
how big a difference there is!]):
Function URLDecode(ByVal DecodeMe As String) As String
Dim result As String 'The output string (may be shorter than the input)
Dim i As Long 'Index for next character in the input string
Dim NextChar As String 'Next character to go to output string (possibly converted)
result = ""
On Error GoTo BadHexCode
For i = 1 To Len(DecodeMe)
NextChar = Mid(DecodeMe, i, 1)
If NextChar = "%" Then
NextChar = Chr(Val("&H" & Mid(DecodeMe, i + 1, 2))) 'hex to character
i = i + 2
ElseIf Mid(DecodeMe, i, 1) = "+" Then
NextChar = " "
End If
result = result & nextChar
Next
URLDecode = result
Exit Function
BadHexCode:
Err.Raise 5, "URLDecode", "Could not interpret % sequence starting at character " & i _
& " of the URL encoded string " & Chr(34) & DecodeMe & Chr(34)
End Function
The StrComp function
[In a nutshell, don't use this!]
Format Conversions
[Todo: I should provide links to my latest VB implementations of
QuoteStringForVB, QuoteStringForSQL, QuoteStringForJavascript, and indicate why
they beat the tar out of most such.]