The laboratory exercise 2 for CS2012 consist of **TWO** laboratory
sessions, Sessions 2 and 3 of your laboratory times.

The exercises aim to develop your insight into the performance of algorithms. We shall look at both the actual running time of algorithms (in milliseconds of CPU usage) and the complexity of algorithms (in terms of operations performed), and compare and contrast these.

The main emphasis is not so much on coding but is on understanding the
performance of the algorithms and why they perform well and poorly on different
inputs. Marks are allocated not just for running code but for your understanding
and explanation of the performance of algorithms. So *be prepared to give
clear explanations to the demonstrator*. To understand the performances you
may need to consult textbooks in the area.

We shall deal with the process of sorting of lists of integers into ascending
order and shall compare the performance of **three** sorting algorithms in
Java: *Insertsort*, *Mergesort* and *Quicksort*.

We are interested in two aspects of performance:

- How the performance varies as the size of the input increases, i.e. the rate of growth of the performance measures.
- How the algorithms perform on special kinds of lists, e.g. those already ascending, or descending order, and those in which there are few items repeated many times.

For the rate of growth we will display the results as graphs of performance as input size varies.

You should produce your methods in a file `Complexity.java` in class
`Complexity`.

**Task 1.**

In order to assess the performance as the size of the input lists increases, you are asked to provide a routine for generating lists of numbers of arbitrary size whose items are randomly generated.

To do so you need to generate random numbers. Random number generators can be
built using arithmetic operations. Strictly speaking we should call these
pseudo-random number generators. There is a considerable literature on them: You
can consult the course textbook*Data Structures and Problem Solving using
Java* by Weiss, Chapter 9. Other accounts are, for example, Sedgewick's
*Algorithms* or Knuth's books, where techniques for generating
pseudo-random numbers and their properties are discussed.

In Java, to generate random numbers you may either use the class
`Random` in the API and its methods, or the method `random` in
package `Math` (see API documentation for details). The latter is a
wrapper for instantiating `Random`. You call it as

double d = Math.random();and this returns a (pseudo)random number greater than or equal to 0.0 and less than 1.0.

Your task is to write a method to generate a list (represented as an array) of random integers (in the range 0...9999 say, ie 4-digit integers). The length of the list should be a parameter that we can change to assess the running time of the various sorting algorithms.

**Task 2.**

We shall also need particular types of lists (as arrays). Write a method to generate ascending and descending lists of integers (simply consecutive integers will do).

Also write a method to generate lists which consists of a few items many times repeated. To do this you may use the random list generator from Task 1, but restrict the possible integers to a small range, say 0-9.

**Task 3.**

Provide code for the three sorting methods *Insertsort*,
*Mergesort* and *Quicksort*. You may write these from scratch, or
get them from textbooks, or from Java libraries or from websites dealing with
algorithms.

Versions of these sorting algorithms may be found in the directory

/opt/info/courses/CS2012/labs/lab2.However you come by them, you must understand how they work. For each method, there are various ways of coding it, for example

**Task 4.**

Perform *timing analysis* on the three sorting algorithms on the
various types of lists above (random, ascending, descending and few items).

To find the running time of a program you should use one of the Java timing
classes. One suggestion is to use the class `Duration` from the CS1092
laboratory exercise on timing. This provides a timer for algorithms returning
the CPU time in milliseconds. You should be aware that the actual CPU time can
be affected not just by the algorithm running, but by other processes which may
interrupt, so try to minimize other processes running at the same time.

The class `Duration` is available in
/opt/info/courses/CS2012/labs/lab2.

You can access it (as usual) by

javac -classpath /opt/info/courses/CS2012/labs/lab2 Complexity.javaThe documentation for this class is in this directory in file

You should run each sorting algorithm on the various types of lists of increasing lengths. The actual lengths of the lists for realistic timing in milliseconds depend (of course) on the speed of your processor. A maximum length of 5,000 or 10,000 may be appropriate, but on slower processors this may be too large. Experiment! Create about 10 lists of lengths evenly spaced up to the maximum.

Store each set of results in a file which consists of pairs of lengths and running times, one pair per line, such as:

400 830 800 1510 ... etc

Since you need to run various algorithms on various types of list with various sizes, I suggest (for no extra marks), that you provide command-line arguments to determine which of these cases to run.

**Task 5.**

Plot graphs of the timing of each algorithm against the size of the input list.

You may use any plotting method. A simple one is to use the Linux utility
`gnuplot`. Type `gnuplot` and at the prompt, type `plot
"datafile"` where `datafile` is a file that you have created with a
list length and running time on each line (as above). To obtain a printed copy
of such a plot, do *not* just grab a screen image of the `gnuplot`
output, this creates enormous files which block up the printer queue. You should
(in `gnuplot`)

set terminal postscript set output "myfile.ps" plot .....and then print the file using lpr, after using

**Task 6.**

This task measures the performance of algorithms by counting how many operations are executed during processing. This is the time complexity of a algorithm. We compare this to the running time as computed in the previous tasks.

Can we know simply by looking at an algorithm how long it is going to take to
run? If so then we may be able to design algorithms to have better performances.
The answer is ``yes'' and the idea is to choose operations to count, operations
which we consider will take alot of processing time, and count them! That is,
count how many times they will be executed whilst the algorithm runs. In the
lectures, we show how we can count them *without running the algorithm*!
Here we support this analysis by explicitly counting operations whilst the
algorithm runs and comparing it to the running time of the previous exercise. We
use the three sorting algorithms of the previous exercise.

We count the number of operations in the sorting algorithms. Which operations? There are several possibilities (and it turns out it doesn't matter much). The standard is to count the order operation by which we compare elements.

Modify your sorting algorithms to count the number of comparison operations performed.

Run the modified algorithms on the same lists as for the timing analysis and plot the count of the number of operations against the length of the lists. How does it compare to the timing analysis?

**Task 7.**

Be prepared to explain to the demonstrator all the graphs of timing and complexity that you produce.

For marking, you need to show the demonstrator your code for the algorithms, show that they work and explain why. You will also need to explain the shape of the graphs: Look at the timing graphs for random lists. Clearly some algorithms grow slower than others as the size of the input increases. Simple algorithms are not always the best! Can you see what sorts of rates of growth are involved? For example: are they linear - a straight line, quadratic - grow as where is the length of the list, or cubic, or linear-logarithmic , etc.? Hint: a linear algorithm doubles its running time as the size of the input doubles. A quadratic one quadruples (times 4) its running time as the size of the input doubles, and a cubic is times 8 etc. Also note that for short lists the rate of growth may differ from that of longer lists, and note that some graphs are more regular than others. Why is this? Consider also the behaviour on the special forms of lists. Explain each of these and compare them to the behaviour on the random lists.

Examine the graphs of time complexity and explain how they compare to the timing analysis.

You need not write down your explanations but you should be able to explain clearly to the laboratory demonstrator.

Marking, out of 15 marks:

- Code for generating lists and for timing: 6 marks
- Code for time complexity via counting operations: 1 mark
- Explanations of timing and complexity on random lists: 4 marks
- Explanation of timing and complexity on special lists: 4 marks

You should labmail your file `Complexity.java`.

2004-04-30