The laboratory exercise 2 for CS2012 consist of TWO laboratory sessions, Sessions 2 and 3 of your laboratory times.
The exercises aim to develop your insight into the performance of algorithms. We shall look at both the actual running time of algorithms (in milliseconds of CPU usage) and the complexity of algorithms (in terms of operations performed), and compare and contrast these.
The main emphasis is not so much on coding but is on understanding the performance of the algorithms and why they perform well and poorly on different inputs. Marks are allocated not just for running code but for your understanding and explanation of the performance of algorithms. So be prepared to give clear explanations to the demonstrator. To understand the performances you may need to consult textbooks in the area.
We shall deal with the process of sorting of lists of integers into ascending order and shall compare the performance of three sorting algorithms in Java: Insertsort, Mergesort and Quicksort.
We are interested in two aspects of performance:
For the rate of growth we will display the results as graphs of performance as input size varies.
You should produce your methods in a file Complexity.java in class Complexity.
In order to assess the performance as the size of the input lists increases, you are asked to provide a routine for generating lists of numbers of arbitrary size whose items are randomly generated.
To do so you need to generate random numbers. Random number generators can be built using arithmetic operations. Strictly speaking we should call these pseudo-random number generators. There is a considerable literature on them: You can consult the course textbookData Structures and Problem Solving using Java by Weiss, Chapter 9. Other accounts are, for example, Sedgewick's Algorithms or Knuth's books, where techniques for generating pseudo-random numbers and their properties are discussed.
In Java, to generate random numbers you may either use the class Random in the API and its methods, or the method random in package Math (see API documentation for details). The latter is a wrapper for instantiating Random. You call it as
double d = Math.random();and this returns a (pseudo)random number greater than or equal to 0.0 and less than 1.0.
Your task is to write a method to generate a list (represented as an array) of random integers (in the range 0...9999 say, ie 4-digit integers). The length of the list should be a parameter that we can change to assess the running time of the various sorting algorithms.
We shall also need particular types of lists (as arrays). Write a method to generate ascending and descending lists of integers (simply consecutive integers will do).
Also write a method to generate lists which consists of a few items many times repeated. To do this you may use the random list generator from Task 1, but restrict the possible integers to a small range, say 0-9.
Provide code for the three sorting methods Insertsort, Mergesort and Quicksort. You may write these from scratch, or get them from textbooks, or from Java libraries or from websites dealing with algorithms.
Versions of these sorting algorithms may be found in the directory
/opt/info/courses/CS2012/labs/lab2.However you come by them, you must understand how they work. For each method, there are various ways of coding it, for example Insertsort may be iterative or recursive; there are various ways of choosing the pivot elements in Quicksort; etc. For Quicksort your choice of pivot should include at least the first item in the list (you may wish to implement other choices too).
Perform timing analysis on the three sorting algorithms on the various types of lists above (random, ascending, descending and few items).
To find the running time of a program you should use one of the Java timing classes. One suggestion is to use the class Duration from the CS1092 laboratory exercise on timing. This provides a timer for algorithms returning the CPU time in milliseconds. You should be aware that the actual CPU time can be affected not just by the algorithm running, but by other processes which may interrupt, so try to minimize other processes running at the same time.
The class Duration is available in /opt/info/courses/CS2012/labs/lab2.
You can access it (as usual) by
javac -classpath /opt/info/courses/CS2012/labs/lab2 Complexity.javaThe documentation for this class is in this directory in file Duration.html.
You should run each sorting algorithm on the various types of lists of increasing lengths. The actual lengths of the lists for realistic timing in milliseconds depend (of course) on the speed of your processor. A maximum length of 5,000 or 10,000 may be appropriate, but on slower processors this may be too large. Experiment! Create about 10 lists of lengths evenly spaced up to the maximum.
Store each set of results in a file which consists of pairs of lengths and running times, one pair per line, such as:
400 830 800 1510 ... etc
Since you need to run various algorithms on various types of list with various sizes, I suggest (for no extra marks), that you provide command-line arguments to determine which of these cases to run.
Plot graphs of the timing of each algorithm against the size of the input list.
You may use any plotting method. A simple one is to use the Linux utility gnuplot. Type gnuplot and at the prompt, type plot "datafile" where datafile is a file that you have created with a list length and running time on each line (as above). To obtain a printed copy of such a plot, do not just grab a screen image of the gnuplot output, this creates enormous files which block up the printer queue. You should (in gnuplot)
set terminal postscript set output "myfile.ps" plot .....and then print the file using lpr, after using gv to check that it looks right.
This task measures the performance of algorithms by counting how many operations are executed during processing. This is the time complexity of a algorithm. We compare this to the running time as computed in the previous tasks.
Can we know simply by looking at an algorithm how long it is going to take to run? If so then we may be able to design algorithms to have better performances. The answer is ``yes'' and the idea is to choose operations to count, operations which we consider will take alot of processing time, and count them! That is, count how many times they will be executed whilst the algorithm runs. In the lectures, we show how we can count them without running the algorithm! Here we support this analysis by explicitly counting operations whilst the algorithm runs and comparing it to the running time of the previous exercise. We use the three sorting algorithms of the previous exercise.
We count the number of operations in the sorting algorithms. Which operations? There are several possibilities (and it turns out it doesn't matter much). The standard is to count the order operation by which we compare elements.
Modify your sorting algorithms to count the number of comparison operations performed.
Run the modified algorithms on the same lists as for the timing analysis and plot the count of the number of operations against the length of the lists. How does it compare to the timing analysis?
Be prepared to explain to the demonstrator all the graphs of timing and complexity that you produce.
For marking, you need to show the demonstrator your code for the algorithms, show that they work and explain why. You will also need to explain the shape of the graphs: Look at the timing graphs for random lists. Clearly some algorithms grow slower than others as the size of the input increases. Simple algorithms are not always the best! Can you see what sorts of rates of growth are involved? For example: are they linear - a straight line, quadratic - grow as where is the length of the list, or cubic, or linear-logarithmic , etc.? Hint: a linear algorithm doubles its running time as the size of the input doubles. A quadratic one quadruples (times 4) its running time as the size of the input doubles, and a cubic is times 8 etc. Also note that for short lists the rate of growth may differ from that of longer lists, and note that some graphs are more regular than others. Why is this? Consider also the behaviour on the special forms of lists. Explain each of these and compare them to the behaviour on the random lists.
Examine the graphs of time complexity and explain how they compare to the timing analysis.
You need not write down your explanations but you should be able to explain clearly to the laboratory demonstrator.
Marking, out of 15 marks:
You should labmail your file Complexity.java.