Site hosted by Angelfire.com: Build your free website today!

Piglet's Home Page/Computer Science Home

COSC 242 Notes
Algorithms and Data Structures

Menu:
  1. Complexity Classes
  2. Induction
  3. Sorting
  4. Hashing
  5. Binary Search Trees
  6. Red-black Trees
  7. B-trees
  8. Graphs
  9. Dynamic Programming
  10. P and NP

  1. Complexity Classes
    1. Asymtotic Notation
      1. f = O(g) : g is an upper bound on f
      2. iff there are positive integers c and n0 such that for all n ≥ n0, then f(n) ≤ c*g(n)

      3. f = Ω(g) : g is a lower bound on f
      4. iff there are positive integers c and n0 such that for all n ≥ n0, then f(n) ≥ c*g(n)

      5. f = Θ(g) : g and f's rates of growth are equivalent
      6. iff f = O(g) and f = Ω(g)

      7. f = o(g) : g is an upper bound, and f != Θ(g)
      8. iff f = O(g) and limn→∞f(n)/g(n) = 0

    2. O(1) - Constant
    3. insertion into a hash table
      rotation in a red-black tree

    4. O(log n) - Logarithmic
    5. binary search, and insertion/deletion into/from bst's and rbt's
      searching a red-black tree

    6. O(n) - Linear
    7. the merge in merge sort, and partition in quicksort (best and average case)
      non-comparing sorts: counting, radix and bucket dynamic programming approach to assembly line scheduling

    8. O(n log n)
    9. merge sort (n + log n) and quicksort

    10. O(n2) - Quadratic
    11. insertion sort

    12. O(2n) - Exponential
    13. brute-force (trial and error) method for assembly line scheduling

    14. O(n!) - Factorial
    15. generate and test algorithms - eg. brute-force method of finding a Hamilton cycle

    return to menu


  2. Induction
  3. (Basis)prove for some n0
    (Step)suppose for k (or if you prefer ni) it is true (the Induction Hypothesis), then prove it is true for k+1 (using the IH)
    (Conclusion) thus it is true for every n such that n ≥ n0

    return to menu


  4. Sorting
    1. Insertion Sort
    2. Algorithm:
      1. starting from the second-to-left and moving to the right of the array:
      2.    make a pointer to, or copy of, the current value (temp)
      3.    until you find a value that is ≤ temp, do
      4.      move each value up in the array
      5.    then place temp in last occupied position
      Analysis: f(n) = n(n-1)/2 → O(n2)

    3. Merge Sort
    4. Algorithm (recursive):
      1. if only one element left in array to sort, return
      2. mergesort left half
      3. mergesort right half
      4. merge sorted halves together
      Algorithm for the merge part:
      1. merge array A and array B into array Z
      2. initialize a, b, z to 0
      3. while elements still left in arrays A and B
      4.    if A[a] < B[b], copy A[a] into Z[z], incrementing a and z
      5.    else copy B[b] into Z[z], incrementing b and z
      6. copy rest of unfinished array into Z
      Analysis:

    5. Quicksort
    6. Algorithm:
      1. if array has only one element (or less) in it then return (sorted)
      2. middle_index = partition(array, low_index, high_index)
      3. quicksort(array, low_index, middle_index)
      4. quicksort(array, middle_index + 1, high_index)
      The Partition Algorithm:
      1. let the pivot_value = array[begin], and set i and j to be on the outside of the array
      2. loop:
        1. decrement j till array[j] ≤ pivot_value
        2. increment i till array[i] ≥ pivot_value
        3. if i still less than j, simply swap values array[i] and array[j]
        4. else return j
      Analysis:

    7. Counting Sort
    8. the first non-comparing sort - sorting in linear time
      Algorithm:
      1. initialize all values of count array (of size k) to 0
      2. walking through (j++) the array to sort, count_array[array[j]]++
      3. from i = 1 to k+1, count_array[i] += count_array[i-1]
      4. working backwards from the end of the original array (j = n-1; j--):
        sorted_array[count_array[array[j]]] = array[j], then decrement count ie. count_array[array[j]]--
      Analysis: 2(k+n), so O(n)
      assumes input are integers in a small range

    9. Radix Sort
    10. use a stable sort (eg. counting sort) to sequentially sort each digit
      eg. sorting 64-bit numbers at 16-bits per pass, so need 4 passes (constant) so O(n) still
      disadvantage: requires some uniformity of number

    11. Bucket Sort
    12. assumes uniform distribution
      1. for n keys, create n buckets of even subintervals
      2. calculate bucket key goes into: first f(key)→ [0,1) range (ie. scale), then bucket = n*f(key)
      3. sort buckets with insertion sort

    return to menu


  5. Hashing
    1. Key Transformation
    2. first convert key (eg. a word) into a natural number, then apply the hashing function

    3. Universal Hashing (to avoid collisions)
      1. choose a prime, p, such that p > all keys
      2. then ha,b(key) = ((a*key + b) % p) % size_table
      3. choose a and b randomly, such that both < p-1, and > 0

    4. Perfect Hashing (to avoid/deal with collisions)
      1. choose the primary hash function as above, using trial and error to get good values of a and b
      2. keep count, ni, of number of items for each slot
      3. p is as for the primary hash function, size_table = ni2 and ai and bi again chosen randomly and checked to ensure NO collisions

    5. Chaining (to deal with collisions)
    6. Linear Probing
    7. Quadratic Probing
    8. Double Hashing

    return to menu


  6. Binary Search Trees
    1. Binary Trees to Binary Search Trees
    2. trees better than hash tables for: definition:

    3. BST Search
    4. O(log n)
      1. if T is empty → return "item not found"
      2. compare x to the key value in T
        • if equal, return T
        • if x < key, search TL else search TR

    5. BST Insertion
    6. O(log n)
      1. if T is empty, make root node with key = key_insert
      2. else compare key_insert to the key value in T
        • if key_insert < key, insert into the TL
        • else insert into the TR

    7. Min and Max Keys
    8. min = find left-most node, max = find right-most node

    9. Predecessor and Successor
    10. pred = the next smallest key
      1. if T has a TL, find the max of that
      2. else find first ancestor in which T is in its right subtree
      succ = the next largest key
      1. if T has a TR, find the min of that
      2. else find first ancestor in which T is in its left subtree

    11. BST Deletion
      1. search till we find the item, k, at the root then
      2. delete root
        • case 1: root has an empty TL or TR, so just replace the root with that child
        • case 2: both children non-empty: replace root by it's successor, then recursively delete the successor from the TR

    12. Tree Traversal
      1. Inorder
      2. uses: sort (similar to quicksort - we have pivots), print in order
        1. TL
        2. root
        3. TR

      3. Preorder
      4. uses: get a picture of tree structure (write to file in preorder, so can remake the exact same bst again by inserting in this order), incremental depth
        1. root
        2. TL
        3. TR

      5. Postorder
      6. uses: compiler for arithmetic (gives equation in Polish notation, operators at nodes, varibles/numbers at leaves)
        1. TL
        2. TR
        3. root

    return to menu


  7. Red-black Trees
    1. Properties of RBTs
    2. Black-height and Height
    3. black-height = number of black nodes from a node n, to a leaf (inc. leaf but not node itself)
      height = count up from (dummy) leaves

    4. RBT Rotation
    5. rbt rotate

    6. RBT Insertion
      1. insert as per BST insertion
      2. colour node x red
      3. if parent[x] also red, have a violation so fix:
        1. case 1: x's uncle red
          solution: push black down from grandparent to parent and uncle

          rbt insert - case 1

        2. case 2: x's uncle black, x is an inside child
          solution: rotate x up and x's parent down to get case 3

          rbt insert - case 2

        3. case 3: x's uncle black, x is an outside child
          solution: swap colour of parent and grandparent, rotate grandparent down and parent up

          rbt insert - case 3

    7. RBT Deletion
    8. let z be the node to be deleted
      y = the node that gets spliced out
        = z (if z is a leaf node) or succ[z]
      x be the node that replaces the spliced out node, y
      1. carry out deletion as per bst deletion, only leave colour field of z the same
      2. if spliced out node, y (z or succ[z]) is black, give x an extra black and fix (first take care of trivial case where x is red, simply colour black) when x is black:
        • case 1: x's sibling is red solution: swap colour between x's sibling and x's parent, rotate sibling up and parent down

          rbt delete - case 1

        • case 2: x's sibling is black, sibling's children both black
          solution: push extra black from x, and black from sibling (so sibling gets coloured red), up to parent

          rbt delete - case 2

        • case 3: x's sibling is black, sibling's closest child is red (and furtherest child black)
          solution: swap colour between sibling and closest child, rotate up sibling's closest child and down sibling

          rbt delete - case 3

        • case 4: x's sibling is black, sibling's furtherest child is red
          solution: swap colour of sibling and its furtherest child, rotate sibling up and parent down, then push black from x to parent

          rbt delete - case 4

    return to menu


  8. B-trees
    1. B-tree Properties
    2. minimum degree, t, so

    3. B-tree Insertion
      1. find the appropriate leaf node
      2. if not full, simply insert the new key
      3. else if full (ie. already has 2t-1 keys in it), split:
        1. put the middle key into the parent (if node was the root, create a new root as parent)
        2. leave smallest t-1 keys in existing node, put biggest t-1 keys into new sibling node
        3. insert key into appropriate child leaf

    4. B-tree Deletion
      1. on way to node containing key to delete (and that node too), strengthen nodes with num_keys < t (ie. nodes that only have t-1 keys), by either:
        1. borrow from sibling, via parent
        2. or merge with sibling and intermediate parent key
      2. once we've found our (strengthened, so num_keys > t-1) node from which to delete a key:
        • case 1: node is a leaf node → simply delete the key
        • case 2: node is an internal node, and there are at least t keys in node containing the key's pred
          → swap key with pred and recursively delete
        • case 3: node is an internal node, and there are at least t keys in the node containing the key's succ
          → swap key with succ and recursively delete
        • case 4: → merge key down and out with preceding and succeding child nodes to give one large node from which to delete the key

    return to menu


  9. Graphs
    1. Implementing Graphs
      1. Adjacency Matrix
        • n×n Boolean array A such that if there is an edge from i to j then A[i][j] = 1, and 0 if there is no edge
        • mirror line across diagonal for undirected graphs
        • used when graph is dense (num_edges close to n2)
        • most efficient implementation for determining if an edge between two given vertices exist (O(1) cf. adj. list this is O(n))

      2. Adjacency List
        • array of linked lists, where each list contains all vertices that have edges to the vertex that the array position represents
        • used when graph is sparse (num_edges < n)
        • most efficient implementation for graph traversal, and finding all vertices adjacent to some vertex j

    2. Breadth-first Search
    3. Depth-first Search
      1. Algorithm
      2. use timestamps: d = discovered time, f = finished time
        DFS leads to backtracking
        1. colour all vertices white, and set time = 0
        2. for each vertex, u, that is not white do DFS_visit(u)
        DFS_visit(u) algorithm (there is a global variable, time):
        1. colour u grey
        2. increment time, then set d[u] = this new time
        3. for each vertex, v, adjacent to u, and that is white:
          • DFS_visit(v)
        4. colour u black, increment timestamp again, and set f[u] = current time

      3. Applications
        • edge classification
          • T = tree edges (grey to white/destination - what we actually follow)
          • B = back edges (grey to grey - when a newly discovered vertex is pointing back to it's own ancestor)
          • F = forward edges (grey to black - when a vertex is pointing to an already finished decendent)
          • C = cross edges (grey to black - black vertex was coloured black by another route - not including the grey vertex)
          a directed, or undirected, graph is acyclic iff DFS produces no back edges
        • topological sort of a dag
        • dag = directed acyclic graph (has no directed cycles)
          Algorithm (equiv. to sort in order of decreasing finishing times):
          • while doing DFS, as finish (ie. colour black) a vertex, add to the front of the list
        • connected components of an undirected graph
        • before calling a new DFS_visit from the main DFS algorithm, set a new connected_component_number
        • strongly connected components
        • sets of vertices in which any two are mutually reachable
          Algorithm:
          1. call DFS on the graph to calculate finishing times for each vertex
          2. find the transpose of the graph (will have the same strongly connected components)
          3. call DFS on this transposed graph, calling vertices in order of decreasing finishing time
          4. each tree of the second DFS is a strongly connected component

    4. Heaps
    5. use as a priority queue, where value at top/front of the queue is used next
      min-heaps and max-heaps - bit like a nearly complete binary tree which can be implemented as an array such that for a node in cell i (cell 0 being empty): Insertion into heaps:
      1. add item to end of heap/array
      2. heapify: compare value with parent and percolate up (swapping with parent) as necessary
      Extraction from top of heap:
      1. remove root
      2. replace by rightmost leaf, val
      3. heapify: compare val with children and percolate down (swapping with largest child if a max-heap, and smallest child if a min-heap)
      insert and extract both O(log n) in the worst case

    6. Prim's Algorithm
    7. for finding the minimum spanning tree in an undirected weighted graph from some source vertex, s
      1. initialize all vertices priority values to ∞ (except put that of s to 0), and set all pred[vertex] = -1
      2. insert all these vertices into a priority queue
      3. while the priority queue (a min-heap) is non-empty:
        • extract the minimum vertex, u, and for each of it's adjacent vertices, u, still in the queue:
          • if weight(u, v) < priority[v] then set priority[v] = weight(u, v) and pred[v] = u

    8. Dijkstra's Algorithm
    9. for finding the single source shortest path from a source vertex, s, to every other vertex in a directed weighted graph
      1. initialize all vertices distance_from_start, d to ∞ (except put that of s to 0)
      2. insert all these vertices into a priority queue, based on d
      3. while the priority queue (a min-heap) is non-empty:
        • extract the minimum vertex, u, and for each of it's adjacent vertices, u:
          • RELAX: if d[u] + weight(u, v) < d[v] then set d[v] = d[u] + weight(u, v)

    return to menu


  10. Dynamic Programming
    1. Elements of Dynamic Programming
      1. subproblem optimality
      2. recursion
      3. memoising (to avoid inefficiency of subproblem overlap eg. Sage)
      4. bottom-up

    2. The Fractional Knapsack Problem
    3. easy - just use a greedy algorithm: first put in as much of the item with the best value/cost ratio, then the next best etc.

    4. The 0-1 Knapsack Problem
      1. create a num_items × increm_weight value matrix V[k,w] and initialize first row to 0
      2. for each item, k:
        • initialize V[k,0] = 0 (ie. eventually sets first column to zero)
        • for each value of the incrementing cost/weight, w:
          1. if wk > w (ie. item k can't fit) then set V[k,w] = V[k-1, w]
          2. else if V[k-1,w] > valuek + V[k-1,w-wk] (better value without including item k) then set V[k,w] = V[k-1,w]
          3. else (better value with item k) V[k,w] = valuek + V[k-1,w-wk]
      O(n!) really but here we've got it running at O(nW) time

    5. Assembly-line Scheduling
    6. brute-force trial-and-error approach is O(2n) (as 2n ways in which stations on line 1 can be chosen)
      1. calc. optimum timeline1[j] = min(timeline1[j-1] + aline1,j, timeline2[j-1] + aline1,j + transfer_timeline2,j-1)
      2. repeat for line 2
      3. find the minimum overall time and the path taken to get it
      O(n)

    return to menu


  11. P and NP
  12. P = polynomial: solution can be found in O(nk) time
    NP = non-deterministic polynomial: solution can be checked in polynomial time
    NP-complete: the set of problems whose solutions can be checked in polynomial time, and who are all special cases of each other, and none has had a solution found in polynomial time (and we believe none exists - but this hasn't been proven yet)
    NP-complete problems:

    return to menu