# Quicksort is broken. Is it worth fixing?

Quicksort is one of the most popular sorting algorithms. It requires relatively little code and, in general, performs much better than the other short sorting routines, such as the Insertion, Bubble and Shell sorts. If you're not familiar with it, I've added an appendix that illustrates how Quicksort works. Let me clarify. Most versions of Quicksort source code available over the net are broken. Some are badly broken. If you're using Quicksort, it's probably one of those. So... How are they broken? When do they perform badly or fail to work at all? Why? What's needed to fix them? I'll try to answer that.

## Introduction

1. We are told quicksort is good. We are told it usually takes time proportional to N*logN (where N is the number of input records), and indeed it does, for randomly ordered data.
2. Quicksort's worst case takes time proportional to N*N, though that doesn't happen at all often in practice. It's also reputed to do badly on already sorted data. This isn't fair. For some of the really early versions of Quicksort the worst case was an array in sorted order. And the mud stuck.
My own experiments have shown that it in fact does better for mostly sorted data (we shall see what kinds of ordering upset it later; uniformly ascending or descending data speed it up, if we always choose the middle element of a partition to divide it into smaller partitions!). It's just that other sort algorithms do even better.
3. Duplicated records are the real worry (at least... as Quicksort is usually written - but there are two separate fixes for that given later in this article, one a minor change, the other a near-total rewrite that gears Quicksort for handling duplicated data... at a price).
4. No matter how we choose the pivot element, some input orders will make Quicksort perform very badly indeed. I'll provide an "adversarial" algorithm that will shuffle pre-sorted data to upset most modern Quicksort implementations later in the article (at first glance it's more of curiosity value than of practical use, but there's method to my madness, as you shall see...).
5. Most quicksort implementations use recursion to sort both partitions. This is incorrect. Recursion should only be used for the smaller partition, if at all. If it is used for both partitions, the maximum stack depth that might be required is N-1. In other words, if you call a Quicksort implementation that calls itself recursively to sort both partitions, and supply it a large input dataset, it may run out of stack and crash. [X] [link to an example which illustrates the stack problem for a 16-element array, with "middle element pivot" selection.] [second link to a corrected quicksort algorithm that does NOT have this fault]
6. Procedural recursion is not necessary at all, in Quicksort. If we are prepared to set aside a little memory to store indices of the partition elements used thus far, we can do without recursion altogether. But it turns out this slows the sort down.[X] [link to algorithm]
7. Let's get lateral. Is it worth randomizing the input array order?
Possibly. Sometimes. But the overhead required to randomize it may outweigh the benefits.
8. But Quicksort has another problem. It is not a stable sort. An explanation is in order. When, say, a user has a sorted array, and sorts by another column, does the sort algorithm always preserve the relative order of the elements that differ for the previous sort key but have the same value in the new sort key? Most users would expect this! A sort algorithm which always preserves the order of elements (which do not differ in the new sort key) is called a "stable sort". One which does not is called an "unstable sort". This can be illustrated with a table containing four records!

• With an unstable sort
this might happen
(pink shows the shuffling)
Sorted by name
numbername
1Cat
1Fox
3Rabbit
2Rabbit
User sorts by number
numbername
1Fox
1Cat
2Rabbit
3Rabbit
User sorts by name again
numbername
1Cat
1Fox
3Rabbit
2Rabbit
With a stable sort
the user will always see this
Sorted by name
numbername
1Cat
1Fox
3Rabbit
2Rabbit
User sorts by number
numbername
1Cat
1Fox
2Rabbit
3Rabbit
User sorts by name again
numbername
1Cat
1Fox
2Rabbit
3Rabbit

## Sorting small partitions

Quicksort is *not* good at sorting small sections of an array (or small arrays, for that matter). Surprisingly, a Straight Insertion sort will do better. For this reason, in Knuth's "The Art of Computer Programming", volume 2, it is recommended that you not sort partitions of the array of size below about 9 elements, but that you instead "hand over" to an insertion sort after you've used Quicksort to divide the array into partitions of that size or smaller. Knuth calls the "minimum partition length worth sorting" parameter M. Yeah, right, I thought when I read this. But it's true. Though 9 mightn't be right if there are a lot of duplications in the data (that is, records that have a matching sort key). We'll cover that problem later.

<--Switching to Insertion<--
The chart at left shows how different values of the M parameter affect sort performance (for an array of 5000 distinct long integers). These are NOT running averages, I ran the test for each M only once - but the trend lines are clear. The different colours indicate how much (and what sort of order) is found in the input data:
• ascending order (green)
• random order (blue)
• descending order (black)
• sawtooth order (red)
Insertion Sort does well for data already in ascending order because it's a "find the problem and fix it" sort algorithm. Indeed, for 5000 records already in order Insertion Sort *by itself* takes under 0.02 seconds (at least 5 times faster) . However, for 5000 records in random order, it takes over 20 seconds (at least 50 times slower).

So, it is worth "farming out" small partitions to Straight Insertion Sort! The textbooks promise about 8% but (if you're using indices) in VB, it seems to be a little better than that. Perhaps 15%. And m=9 is about right (if there aren't any duplicated sort keys...).

## Partition Selection

The older textbook Quicksort routines chose the first element in the current partition as the pivot. It saves (a little) time if the input data are randomly ordered. But if they're not, it's not a good choice! If all the records are in sorted order (or reverse order), it degrades to either a Straight Insertion or a Bubblesort.
Most recently published implementations either choose the middle element of the partition or a randomly selected element. It doesn't seem to make much difference.

<--Middle Element<--
The chart at left shows timings for a conventional Quicksort algorithm (VB implementation, sorting long integers via an index array, running under the VB IDE on a PII/350), which always chooses the middle element of a range as the partition element, when supplied with
• strictly ascending values (green)
• strictly descending values (blue)
• random values (black)
• sawtooth pattern (red)
for different values of N. As you can see choosing the middle partition works well for data in strictly ascending and descending order. It's difficult to cook up data that will "upset" it.
<--Random Element<--
The chart at left shows timings for the same Quicksort algorithm, modified to choose a random partition element (these aren't averages, I only ran each test once. That's why there are spikes here and there). There's not much difference. Arrays that are already in ascending order tend to sort faster than ones in reverse order because less swaps are necessary. Arrays mostly in reverse order (usually) sort faster than ones mostly in random order because early partitions will swap them - once -and then they'll be in order and won't need swapping again. For N below 1000 the extra cost of calling Rnd() seems to outweigh the advantages.
I haven't shown the data here, but it's much the same story out to N=10000.

## Reducing Recursion

As outlined earlier, most published Quicksort algorithms use recursion to sort both partitions. That is, their english definitions read like this:
1. If current partition size=2 then swap if necessary: exit (Note: this can be dispensed with if we have M=3 or more, see above).
2. Otherwise, choose a partition element, p
3. For each *other* element in the partition, check it against the value of element p.
4. As we find pairs of elements (on each side of the partition) that are on the wrong side, swap them. When we have finished this process we will have two sub-partitions.
5. Recursively sort the left-hand sub-partition
6. Recursively sort the right-hand sub-partition
Now, the problem here is that one of the partitions might have more elements that the other. Indeed, it could in theory have all of the elements other than the last partition element. This means that the recursion depth required to sort an array of n elements could - at least theoretically - be n. Ouch! With a large array we might run out of stack. You might think it won't happen if you choose your partition element "carefully". Ha! As if! It's happened to me, when I used Quicksort routines written to the above spec. Not often, but it happened. It was embarassing. It happened (about once a month) for three months before I figured out why. The last two points of the algorithm should be replaced with:
• Recursively sort the smaller sub-partition
• Set the current partition to the larger partition
• Loop back to the first step
This will guarantee that the maximum stack depth needed is never greater than Ceiling(log2(N)). It will degrade performance only a little (perhaps 1%). But theory isn't going to persuade anyone, is it? How about raw performance data? And I'll include what you can get if you dispense with procedural recursion altogether, implement your own stack, and commit the sin of using Goto (in about six places!).

<--Avoiding Recursion<--
I'll give you the numbers because the differences aren't that clear from the chart. When we compare for 20480 records, we get:
• Evil goto'd version (red) takes 1.1646 seconds
• One-sided recursion (green) takes 1.141
• Double recursion (blue) takes 1.1395
So the goto version with the do-it-yourself stack is actually slower (or, at least the version I wrote was. I may have a play and see if I can improve it. But I think it will lose out). Still, the slightly better performance of double-sided recursion shouldn't tempt you to use it! You could still run out of stack! Okay, it's very, very unlikely! But a 1% performance gain?! Is that worth it?

If you're still in doubt, and think the 1% worth it, it turns out that if we run Quicksort against more records (about 30,000 or more) that turns around, and the double-side recursive version gets slower and slower (relative to the no-recursion version). Why I'm not sure.

## Randomizing the Input Order

If you expect that, generally, you'll be getting records in random order, but you might occasionally be asked to sort records in ascending or descending order, you can use a version of Quicksort geared for sorting records in random order, and make sure they are in random order by shuffling them first.

## Duplicated Values

Now for the horror story! There is a kind of data for which most Quicksort implementations will take time proportional to the square of the number of records. It's "everything the same", and to a lesser extent, data where there are only a few different sort key values. As this chart shows:

<--Duplicate Values...<--
It's the red and pink lines that are really scary! Though the purple line doesn't look too good either. What this graph shows is that using Quicksort to sort on a field which only has a few possible values (or where most of the values of the records will be the same, say for a comment field that isn't used much, or the third line of an address field, or the "reasons you should cut my pay" field...)... is not a good idea. Or at least, not the usual version of Quicksort.
• all records the same (red)
• 2 different values only(pink)
• 5 different values only(purple)
• 10 copies of each value(blue)
• 50 copies of each value(green)

There are two fixes available (that I know of). The first is merely two changes to two comparisons. When we're looking for elements to swap, we usually look for elements that belong "on the other side of the pivot". Well and good. We have to do that. However: we usually don't come in towards the centre from both sides at the same time. We search on the left first and then on the right. The code might look something like this:
```  while l<r and element(l)<=pivot
l = l + 1
wend
while l<r and pivot<=element(r)           'technically we don't need to check l<r here
r = r - 1                               'in most implementations, as the pivot element itself
wend                                      'will have been swapped with the leftmost element in the
'current partition.
'if we get here we may have two records to swap
```
It shouldn't. What we should be doing, if we want better performance when there are lots of duplicated values, is swapping more agressively. In other words, whenever we have an element equal to the pivot or belonging on the other side, we swap it. Yeah, we're deliberately swapping when we don't have to. This idea is from R.M.Sedgewick, and I'm glad he wrote it in to Donald Knuth! It's counterintuitive! It turns out that the extra swaps are less important than the fact that swapping aggressively keeps the left and right subpartitions about the same size whenever most of the values we find are the same as the pivot. With that change in place, running the same tests gets us these results:

<--Duplicate Values Part 2<--
Aa you can see, this version works much better for arrays with many duplicates. I also put in two tests against ascending and randomly ordered arrays (to show it works okay for data without all those duplicates, too).
• all records the same (red)
• 2 different values only(pink)
• 5 different values only(purple)
• randomly sorted data (blue)
• ordered data (every record different)(green)

## Making Quicksort Stable

The comments in this section (and the algorithm it outlines) apply to any unstable sorting algorithm.
If you've just run an unstable sort algorithm (that is, one that may shuffle the pre-existing order of any elements which match according to the sort column), and output an array of indices indicating the output array order, you can make a linear pass over that array to "rekey" it (that is, to restore any pre-existing order between rows which match according to the sort column).

### Late-breaking news...

I was looking at Quicksort, thinking about ways to speed it up for arrays where there are a lot of duplicate rows (for example, when you are sorting by an optional text field there may be many missing values), when I - almost inadvertantly - came up with a way to speed it for that sort of input and make it stable. I'll be publishing that routine just as soon as I'm sure I've got all the bugs out of it. It uses no procedural recursion, and it runs much faster for arrays containing large numbers of duplicate records. My preliminary testing pegs it at about twice the speed of a conventional Quicksort (when there's a lot of record duplication - say about 4 copies of each value on average). But it's more complicated, and performs a little worse when there aren't duplicates (Ironically, it also does worse at exploiting existing order in the input array!). The algorithm is as follows:
1. Start with two scratch arrays big enough to hold N longs, and a fixed size stack big enough to take Log2(N)+1 left and right partition boundaries. Set the current partition equal to the whole input array
2. Choose a partioning value, v, at random, from the current partition. Don't worry about where it was (no funny swapping required). Only the value matters.
3. Search through all the records in the current partition, comparing each to the partitioning element, and
• Write each element less than v into the left of the current partition (keep track of how many you've written)
• Write any element equal to v to the first scratch array
• Write any element greater than v to the second scratch array
4. Now, copy all the elements in the first scratch array next to the elements you've already moved to the left
5. Lastly, copy all the elements in the second scratch array in the room left over (they'll fit exactly)
6. At this point you have a 3-way partition, with elements less than v in the first, elements equal to v in the middle, elements greater than v in the right.
7. If neither partition is big enough to bother with (I stop processing at size 9, as per Donald Knuth's recommendation, though I find with heavily duplicated data the right threshhold is about 13), pop the boundaries of the next partition to sort from the stack and go back to step 2. If there's nothing left in the stack, run a Straight Insertion sort over the array to catch the smaller partitions that weren't worth running Quicksort against.
8. Push the edges of the larger partition onto a stack.
9. Sort the smaller partition, by looping back to step 2
At first glance it ought to be slower, because the elements in the second and third partitions move two times (on average each element in the current partition moves 0.75 times - half the left and half the right elements have to be swapped, and a swap takes 3 moves to swap 2 elements - in each partitioning run of a conventional quicksort, except for the pivot, which moves twice). However, when there are a lot of duplicates in the part of the array being considered, separating out the elements that match the partition element saves a lot of partitioning! So much so that it more than pays for the extra moves. And the routine has less of a rocket science feel (except maybe for its internal stack and the straight insertion phase).
<--Stable 3-Way Quicksort<--
This algorithm performs very well when there aren't many different values in the column of the array being sorted. The fewer there are the better it does. However, the modified version from the last graph works with ordered unique values a lot better, and unique random values a little better, so this one is probably worth using only when you can be certain there will be a lot of duplicates in the area. If you're sure there won't be, use the standard one.
• all records the same (red)
• 2 different values only(pink)
• 5 different values only(purple)
• randomly sorted data (blue)
• ordered data (every record different)(green)

...though I think the 30% performance penalty if there aren't lots of duplicates might be a bit too steep a price to pay. Perhaps you're best off sampling, say, six records, chosen at random, and using a standard Quicksort if there aren't any duplicates among them, using my modified Quicksort algorithm if there are between three and five different values found, and switching to something else again (a unique value sort?) if you found only one or two.