| With an unstable sort this might happen (pink shows the shuffling) |
Sorted by name
|
User sorts by number
|
User sorts by name again
| ||||||||||||||||||||||||||||||
| With a stable sort the user will always see this |
Sorted by name
|
User sorts by number
|
User sorts by name again
|
|
<-- | Switching to Insertion | <-- | ||
|---|---|---|---|---|---|
The chart at left shows how different values of the M parameter affect sort performance (for
an array of 5000 distinct long integers). These are NOT running averages, I ran the test for
each M only once - but the trend lines are clear. The different colours indicate how much (and
what sort of order) is found in the input data:
|
|||||
|
<-- | Middle Element | <-- | ||
|---|---|---|---|---|---|
The chart at left shows timings for a conventional Quicksort algorithm (VB implementation,
sorting long integers via an index array, running under the VB IDE on a PII/350),
which always chooses the middle element of a range as the partition element,
when supplied with
|
|||||
|
<-- | Random Element | <-- | ||
|---|---|---|---|---|---|
|
The chart at left shows timings for the same Quicksort algorithm, modified to choose
a random partition element (these aren't averages, I only ran each test once. That's
why there are spikes here and there). There's not much difference. Arrays that are already in
ascending order tend to sort faster than ones in reverse order because less swaps are
necessary. Arrays mostly in reverse order (usually) sort faster than ones mostly in random order
because early partitions will swap them - once -and then they'll be in order and won't need
swapping again. For N below 1000 the extra cost of calling Rnd() seems to outweigh the
advantages. |
|||||
|
<-- | Avoiding Recursion | <-- | ||
|---|---|---|---|---|---|
I'll give you the numbers because the differences aren't that clear from the chart.
When we compare for 20480 records, we get:
|
|||||
Const minSubFile = 9
Sub ISortAlgorithm_Sort(d As Variant, ByVal l As Long, ByVal r As Long, a As Variant)
If l + minSubFile <= r Then
Quicksort d, l, r, a 'Don't bother quicksorting small arrays
End If
InsertionSort d, l, r, a 'Straight insertion sort
End Sub
Sub Quicksort(d As Variant, ByVal l As Long, ByVal r As Long, a As Variant)
'Description: Two-sided recursion verison (may blow stack limit!)
Dim p As Long 'index of index of partition element
Dim v As Variant 'partition value
Dim rr As Long 'original right boundary
Dim temp As Long 'for swapping indices
rr = r 'remember the index of end of partition
p = Int(l + r) / 2 'pick partition element
v = d(a(p)) 'remember partitioning value
temp = a(p): a(p) = a(l): a(l) = temp 'swap index of partition element to bottom
p = l 'remember where we put it
Do
Do
For l = l + 1 To r - 1
If v <= d(a(l)) Then Exit For
Next
For r = r To l + 1 Step -1
If d(a(r)) <= v Then Exit Do
Next
If rr < l Then 'if we tacked an infinity on the right we could do without this
l = l - 1
ElseIf v < d(a(l)) Then
l = l - 1
End If
temp = a(p): a(p) = a(l): a(l) = temp
If p <= l - minSubFile Then Quicksort d, p, l - 1, a
If rr - l >= minSubFile Then Quicksort d, l + 1, rr, a
Exit Sub
Loop
temp = a(l): a(l) = a(r): a(r) = temp
r = r - 1
Loop
End Sub
Private Sub InsertionSort(d As Variant, ByVal l As Long, ByVal r As Long, a As Variant)
Dim j As Long 'index when searching for inversions
Dim i As Long 'index when correcting inversions
Dim v As Variant 'value of element on the move
Dim oldI As Long 'index of element being moved
For j = l + 1 To r
If d(a(j)) < d(a(j - 1)) Then
i = j - 1
oldI = a(j)
v = d(oldI)
Do
a(i + 1) = a(i)
i = i - 1
If i < l Then Exit Do 'because we don't have a magic low value at the bottom
Loop Until d(a(i)) <= v
a(i + 1) = oldI
End If
Next
End Sub
sub randomizeArrayOrder(byref a As Variant)
dim i As Long 'index of element to swap
dim j As Long 'index of element to swap it with
dim m As Long 'number of elements remaining that might be swapped
dim temp As Long 'for swapping elements (declare as Variant if a isn't an array of longs)
m = ubound(a) - lbound(a) + 1
for i=lbound(a) to ubound(a)-1
j = i + int(m*rnd())
temp=a(i)
a(i)=a(j)
a(j)=temp
m = m - 1
next
end sub
When you are randomizing order by exchanging, it takes three copies to exchange two
elements. If you've got more room to work in you can go a bit quicker.
sub randomizeArrayOrder(byref aIn as Variant, byref aOut as Variant) 'aIn gets trashed
dim i As Long 'index of element to replace
dim j AS Long 'index of element to replace it with
dim m As Long 'number of elements remaining that might be swapped
m = ubound(aIn) - lbound(aIn) + 1
for i=lbound(aIn) to ubound(aIn)
j = i + int(m*rnd())
aOut(i)=aIn(j)
aIn(j)=aIn(i) '<--two copies, not three, but aIn is trashed
m=m-1
next
end sub
'Note that for this to work aIn must not be the same array as aOut!
'You can trick VB into getting that right by calling it like this: randomizeArrayOrder (a),a
'The brackets force VB to make a temporary copy of A. VB can copy an entire array
'in one hit, saving a lot of time.
|
<-- | Duplicate Values... | <-- | ||
|---|---|---|---|---|---|
It's the red and pink lines that are really scary! Though the purple line doesn't look too
good either. What this graph shows is that using Quicksort to sort on a field which only has
a few possible values (or where most of the values of the records will be the same, say for
a comment field that isn't used much, or the third line of an address field, or the
"reasons you should cut my pay" field...)... is not a good idea. Or at least,
not the usual version of Quicksort.
|
|||||
while l<r and element(l)<=pivot
l = l + 1
wend
while l<r and pivot<=element(r) 'technically we don't need to check l<r here
r = r - 1 'in most implementations, as the pivot element itself
wend 'will have been swapped with the leftmost element in the
'current partition.
'if we get here we may have two records to swap
It shouldn't. What we should be doing, if we want better performance when there are lots of
duplicated values, is swapping more agressively. In other words, whenever we have an element
equal to the pivot or belonging on the other side, we swap it. Yeah, we're deliberately swapping
when we don't have to. This idea is from R.M.Sedgewick, and I'm glad he wrote it in to Donald Knuth!
It's counterintuitive! It turns out that the extra swaps are less important than the fact that
swapping aggressively keeps the left and right subpartitions about the same size whenever most of
the values we find are the same as the pivot. With that change in place, running the same tests
gets us these results:
|
<-- | Duplicate Values Part 2 | <-- | ||
|---|---|---|---|---|---|
Aa you can see, this version works much better for arrays with many duplicates. I also put
in two tests against ascending and randomly ordered arrays (to show it works okay for
data without all those duplicates, too).
|
|||||
sub rekeySortOrder(byref d as Variant, byref a as Variant)
' a=array of indices resulting from unstable sort pass
' containing one copy of each of the values between lbound(a)
' and ubound(a).
' d=the column input array supplied to the sorting pass.
'
dim class 'indicates the equivalence class for each element
dim classCount 'count of elements in each equivalence class
dim thisClass 'the current equivalence class number
dim i 'index into array of indices
dim j 'index into array of equivalence counts
redim class(lbound(a) to ubound(a))
redim classCount(0 to ubound(a)-lbound(a))
class(lbound(a)) = 0
classCount(0) = 1
thisClass = 0
for i=lbound(a)+1 to ubound(a)
if d(a(i-1)) < d(a(i)) then
thisClass = thisClass + 1
end if
class(a(i)) = thisClass
classCount(thisClass) = classCount(thisClass) + 1
next
'
'We now know which equivalence class each element belongs in
'And we know how many elements there were in each class.
'So now we just need to know where the rows in each equivalence
'class will start in the output array, and we can re-order.
'
dim offset
dim classOffset
redim classOffset(0 to thisClass)
offset = lbound(a)
for j=0 to thisClass
classOffset(j) = offset
offset = offset + classCount(j)
next
'
'At this point we know (a) where each equivalence class should
'start in the output array (classOffset gives us this),
'and (b) which class each element is in (class gives us this).
'So, we can interate through the class array, writing into the
'part of the output order array for the appropriate equivalence
'class.
'
for i = lbound(a) to ubound(a)
j = class(i)
a(classoffset(j)) = i
classOffset(j) = classOffset(j) + 1
next
end sub
So far as I know, nobody has published an algorithm for making an unstable sort stable which will
execute in time proportional to the number of records (as this one does) - although
list insertion and (particularly) distribution sorts hint at this technique.
With this little beast in your arsenal, you can use unstable sorts where stable sorts are
required, and then run the above code against the sort order.
Const minSubFile As Long= 9
Private partition2 As Variant 'Indices from the right that belong on the left (in order found)
Private partition3 As Variant 'Indices from the left that belong in the centre (ascending)
Sub ISortAlgorithm_Sort(d As Variant, ByVal l As Long, ByVal r As Long, a As Variant)
If Not initialized Then
ReDim partition2(l To r) As Long
ReDim partition3(l To r) As Long
initialized = True
End If
CheckBoundsEnough 0, r - l, partition2
CheckBoundsEnough 0, r - l, partition3
If l + minSubFile <= r Then Quicksort d, l, r, a 'Don't bother quicksorting small arrays
InsertionSort d, l, r, a 'Straight insertion sort
End Sub
Private Sub InsertionSort(d As Variant, ByVal l As Long, ByVal r As Long, a As Variant)
Dim j As Long 'index when searching for inversions
Dim i As Long 'index when correcting inversions
Dim v As Variant 'value of element on the move
Dim oldI As Long 'index of element being moved
For j = l + 1 To r
If d(a(j)) < d(a(j - 1)) Then
i = j - 1
oldI = a(j)
v = d(oldI)
Do
a(i + 1) = a(i)
i = i - 1
If i < l Then Exit Do 'because we don't have a magic low value at the bottom
Loop Until d(a(i)) <= v
a(i + 1) = oldI
End If
Next
End Sub
Private Sub Quicksort(d As Variant, ByVal l As Long, ByVal r As Long, a As Variant)
Dim n As Long 'Number of elements in area to sort
Dim m As Long 'Used to calculate stack size required
Dim p As Long 'index of index of partition element
Dim v As Variant 'partition value
Dim i As Long 'Read point in initial index array
Dim x As Long 'Original index at that point
Dim vv As Variant 'Value of array element being compared to partition
Dim p2Count As Long 'Number of elements from left belong in centre
Dim p3Count As Long 'Number of elements from left belonging at right
Dim w As Long 'Write point (when writing back new order for the partition)
Dim lStack() As Long 'Stack of left boundaries
Dim rStack() As Long 'Stack of right boundaries
Dim stackSize As Long 'Initially, needed stack size. Later, current stack size.
Dim cl As Long 'Destination of first element matching current partition value
Dim cr As Long 'Destination of last element matching current partition value
'cl and cr are used to determine the next partitions to sort.
n = r - l + 1
m = 1
While m < n * 2
m = m * 2
stackSize = stackSize + 1
Wend
ReDim lStack(stackSize) 'We use a stack so we can avoid the overhead for
ReDim rStack(stackSize) 'recursive procedure calls. We only need to stack l and r!
stackSize = -1 'Nothing in the stack
Do
p = Int(l + r) / 2 'pick partition element (could be random, doesn't really matter)
v = d(a(p)) 'remember partitioning value
p2Count = 0
p3Count = 0
w = l
For i = l To r
x = a(i)
vv = d(x)
If vv < v Then
a(w) = x
w = w + 1
ElseIf v < vv Then
partition3(p3Count) = x
p3Count = p3Count + 1
Else
partition2(p2Count) = x
p2Count = p2Count + 1
End If
Next
cl = w 'first point in centre
For i = 0 To p2Count - 1
a(w) = partition2(i)
w = w + 1
Next
cr = w - 1 'last point in centre
For i = 0 To p3Count - 1
a(w) = partition3(i)
w = w + 1
Next
If cl - l < r - cr Then
If minSubFile < r - cr Then
stackSize = stackSize + 1
lStack(stackSize) = cr + 1
rStack(stackSize) = r
End If
r = cl - 1
Else
If minSubFile < cl - l Then
stackSize = stackSize + 1
lStack(stackSize) = l
rStack(stackSize) = cl - 1
End If
l = cr + 1
End If
If r - l < minSubFile Then
If stackSize = -1 Then Exit Do
l = lStack(stackSize)
r = rStack(stackSize)
stackSize = stackSize - 1
End If
Loop
End Sub
|
<-- | Stable 3-Way Quicksort | <-- | ||
|---|---|---|---|---|---|
This algorithm performs very well when there aren't many different values in the column
of the array being sorted. The fewer there are the better it does. However, the
modified version from the last graph works with ordered unique values a lot better,
and unique random values a little better, so this one is probably worth using only when
you can be certain there will be a lot of duplicates in the area. If you're sure there
won't be, use the standard one.
|
|||||