Site hosted by Angelfire.com: Build your free website today!

Sorting Overview

Introduction

Which is fastest?

Well, I said there wasn't an easy answer to this question. But... here's a page of "smart charts" showing what I've found so far, trying out several different sort engines with different sorts of data... but be warned it loads very slowly because it's displaying a lot of graphs. And it doesn't work in Netscape at all!

Programming Techniques


The Articles

  1. Quicksort is broken. Is it worth fixing?
    Quicksort is one of the most touted sorting algorithms. Usually it is compared with Bubblesort (which is like comparing your looks with those of Quasimodo. So what if you look good next to the hunchback?). I am more critical, both of the algorithm itself, and the way it is usually implemented, but I suggest a drastic method for ensuring its worst case is very unlikely to arise, and provide a workaround for one of its more annoying defects.
    Code Examples:
    1. A typical (incorrect) implementation of Quicksort
    2. An implementation that doesn't risk stack overflow
    3. A record order randomizing routine
    4. A routine that makes unstable sort orders stable (I'll cover what is meant by "a stable sort" in the article)
    5. A routine for applying a sort order to an array

  2. Merge sorts
    The Bubblesort and Quicksort algorithms work by determining the correct location of individual elements. Merging sorts work by dividing the array into smaller pieces, sorting them, and then merging. Merge sorts require fewer comparisons but more row swaps than Quicksorts. I discuss the Straight and Natural Merge algorithms. The latter works better for arrays that are almost sorted. Code examples:
    1. Straight merge on a 1D array (typical recursive implementation)
    2. Straight merge on a 1D array (faster, non-recursive implementation)
    3. Merging two lists with a minimal number of comparisons (using Binary Merging).
    4. Natural merge on a 1D array (recursive implementation)

  3. Radix and bucket sorts.
    Radix and bucket sorts provide a radical alternative to both the insertion and merging sort algorithm families. Radix and bucket sorts do not compare elements with one another at all. And they do NOT have the K*N*log(N) performance of Quicksort (N is # of elements, K is average # of bits in the key); they have K*N performance. They're very good at sorting very large arrays, but they are lousy at sorting small arrays, or arrays with a few widely scattered values.
    Code examples:
    1. Radix sort of 1D Array
    2. Bucket sort of 1D Array
    3. A routine for determining which bits of key columns actually need to be sorted, to speed up radix sorts
    4. A routine suitable for sorting data with *very* little variety

  4. Hybrid and adaptive sorts
    This article discussses methods for using calls to simple sorting routines (including those outlined in earlier articles) to implement more powerful and more flexible sorting routines.
    Code examples:
    1. Sorting of 2D Arrays using 1D sorting routines
    2. Sorting of Arrays of Arrays using 1D sorting routines
    3. An adaptive sort routine (for 2D arrays)

  5. Other Sort Algorithms
    This article discusses some other sorting algorithms. As well as the familiar Bubble and Shell sorts, it discusses the Heapsort, and the straight insertion sort.
    The fastest algorithms for sorting very small array subsets are already known (and they often aren't instances of any of the standard sort algorithms). A few are shown here as examples.
    Code examples:
    1. Bubble Sort of 1D Array
    2. Shell Sort of 1D Array
    3. Heapsort of 1D Array
    4. Straight Insertion Sort (and Binary Insertion)
    5. Tree Sort
    6. Straight Selection Sort
    7. Hand-Tailored (optimal) sorts for 3 to 11 elements
    8. Merge Insertion

    Some of these sort algorithms have spin-offs. A Straight Selection Sort, for example, while it is almost as slow as Bubble Sort, hints at a good way to answer questions like "how many people earn more than me" without having to sort the data to get the answer.
    Heapsort is nearly as fast as Quicksort and doesn't run the risk of long running times the way Quicksort does.
    Merge Insertion is very good at cutting down the number of comparisons required to get the right answer.
     
  6. Pipelines in Sorts
    Earlier articles have outlined reasons for delaying actually moving records until late in any sorting algorithm. This article discusses the "wrinkles" (such as descending sort orders, correctly sorting Null values, implementing case insensitive sorts), outlines how these are normally dealt with in published sort algorithms (and, I hate to admit it, most of mine), and explains why they - and I - do it wrong and how it ought to be done. The code examples from article#4 are revisited, and the routines updated to provide support across the board for these features, without paying the performance penalties normally associated with them. The point is that it is faster to convert strings to upper case ONCE for each string than it is to convert each character, one at a time, whenever you are comparing two strings (as, in most cases, in most sort routines, each string is likely to be compared to several others).
    Code examples:
    1. A routine for removing nulls from a 1D array before determining its sort order
    2. A routine for putting the nulls *back*
    3. A routine for converting the elements of a 1D array of strings to upper case
    4. An updated adaptive sort routine that handles Nulls, descending orders, and case insensitive sorts.
    5. Routines for hashing strings so that the hash values can be sorted rather than the strings themselves
    6. A routine that sorts strings according to their hash values

    Links to other articles:
    1. String Article#1 (Speeding up string comparisons and searches)

  7. Keeping things Sorted
    Everything so far has been a discussion of how to get records sorted. In this article, the focus shifts to the more important (but less frequently discussed) problem of keeping things sorted. This is where the Natural Merge algorithm really comes into its own!
    Code examples:
    1. A routine to track changes made to a 2D array; including row updates, inserts, and deletions
    2. A routine to issue batch updates to a 2D array that is already sorted, in such a way that it remains so after the updates, without having to be sorted again(!)
    3. A routine to record ordering information as a string, so it can be read back in from the string (or from a file).
    4. A class which encapsulates all of the functionality discussed in this and earlier articles

  8. Hashes and equivalence sets
    A hash is a function that maps data in one format into another more compact format. An equivalence set is a subset of an array where all of the values are equivalent (according to a chosen sort key). This article discusses ways that hashes and equivalence sets can be used in sorting pipelines, and outlines the (sometimes surprising) benefits to be gained. Using equivalence classes may, in some situations, speed things up AND reduce the memory requirements for storing data.
    Code Examples:
    1. Changes to allow and exploit the use of hashing functions in sorts of string columns
    2. Changes to allow and exploit the use of equivalence sets

  9. The Window Sort - trading throughput for better reponse time.
    Overall execution time is not always the most important parameter for assessing which sort algorithm to use. Sometimes response time (such as, the time to return the top ten scores in a table of game scores) is more important than the time it takes to get *all* of the answers. This is why the TOP keyword helps so much, in Access and SQL Server queries.
    Code Examples:
    1. ...

  10. VB Tricks for Sorting
    There are features of Visual Basic that can be exploited to greatly increase the performance of sorting operations. These are particularly effective if you are storing data in arrays of arrays rather than 2D arrays.
    These include (i) specifying the exact types of the values to be compared with one another, and (ii) making API calls to swap data within variant variables and variant arrays.
    Code Examples:
    1. A routine to swap two Variant variables, quickly. To give you the taste of the article, here's the code (NT/Wintel only, I'm afraid!):
       
      Public Declare Sub CopyMemory Lib "kernel32" Alias _
          "RtlMoveMemory" (Dest As Any, Source As Any, ByVal bytes As Long)
      
      Public Const VARIANT_SIZE = 16
             
      Public Function SwapVariantsByMemoryCopy(ByRef Var1 As Variant, ByRef Var2 As Variant)
          '
          'Description:   Swap two variants by copying their data to & from each other
          '               using memory copy calls
          'Explanation:   This is a *very* fast way to swap two arrays (4 sixteen byte moves
          '               rather than recreating arrays).  It's even
          '               worthwhile for swapping strings (longer than about length 10).
          '
          Dim temp As Variant
          Dim Temp2 As Variant
          CopyMemory temp, Var1, VARIANT_SIZE
          CopyMemory Var1, Var2, VARIANT_SIZE
          CopyMemory Var2, temp, VARIANT_SIZE
          CopyMemory temp, Temp2, VARIANT_SIZE
      End Function
      
      
      ...You may be wondering why there are *four* copies rather than the three it normally takes to swap two variables... Well, it's all to do with the icky details of how VB does garbage collection on Variant variables. If you leave out the last CopyMemory call, VB will probably crash.
    2. A routine to reorder a 2D array using memory copies
    3. A routine to reorder an array of arrays using memory copies.
    4. A routine which, given a sort template, can generate the source code for a strongly-typed routine
    5. An updated version of the classes from articles 7 and 8 which incorporates strongly-typed versions of the sort algorithms that have featured in articles 1,2,3 and 5.