INFS 519 - November 2, 2010

Table of Contents

Reading

This lecture covers material from 10.1 and 12.

Heaps

A heap is a binary tree whose elements can be compared with total order semantics. That is all elements in the heap can be listed in order from smallest to largest.

Heap Storage Rules

  1. The element contained by each node is greater than or equal to the elements of that node's children.
  2. The tree is a complete binary tree so that every level except the deepest must contain as many nodes as possible; at the deepest level, all the nodes are as far left as possible.

Just like complete binary trees, heaps can be implemented with an array.

Priority Queues with Heaps

If we use the priority of an enqueued element determine its order, then a heap can be used to store prioritized elements where every node in the heap contains an element with a priority higher or equal to that of its children nodes.

Adding an Element

  1. Place the new element in the heap in the first available location. This keeps the structure as a complete binary tree, but it might violate a constraint of heaps that parent nodes hold an element with a higher value than its children.
  2. While the new element has a priority higher than its parent, swap the new element with its parent.

Removing an Element

  1. Copy the root element of the heap, this is the return value.
  2. Move the last element in the deepest level to the root.
  3. The new root element is likely out-of-place. While the out-of-place element has a value lower than one of its children, swap it with the highest-valued child.

Sorting

We'll be covering selection sort, insertion sort, merge sort, quicksort, and heapsort.

Selection Sort

Select the largest (or smallest) element in the list and then move it the appropriate place in the list by swapping its value with what is already at the location.

Using the list, [8, 3, 5], as an example:

  1. Scan entire list and see that 8 is the largest.
  2. Since it's the largest, it belongs in the last position of the list.
  3. Swap 8 with 5: [5, 3, 8].
  4. Now look for the next largest element and repeat.
for (i = n-1; i > 0; i--)

    // Find largest value
    idxBig = 0
    for (j = 1; j <= i; j++)
        if (data[idxBig] < data[j]
            idxBig = j

    swap data[i] with data[idxBig]

Run-time Analysis

Nested for-loops, both dependent on n, the size of data. An O(n) operation done n times is O(n 2).

O(n 2)

Insertion Sort

Build up a "new" list by inserting elements from the original list. At each iteration, ensure the list will remain sorted by choosing to insert the next element in the correct position.

Note that we can simply reuse the existing list for this sort as we are simultaneously removing an element from the old list and adding it to the new one.

for (i = 1; i < n; i++)
    // Insert the element from data[i] into the portion of the array
    // from data[0] through data[i-1] so that all data[0] through data[i] is sorted.
    for (j = first + i; isWrongSpot(entry, j); j--)
        data[j] = data[j-1]

    // Put entry in place
    data[j] = entry

boolean isWrongSpot(entry, j)
    return (j > first) && data[j - 1] > entry
        

Run-time Analysis

Also quadratic, O(n 2), for both worst-case and average-case.

But for best-case, is linear, O(n), when the list is already sorted.

Merge Sort

In merge sort, the array is divided into two equally-sized groups. The sub-arrays are recursively sorted by dividing the array into smaller pieces and then reassembling the halves in sorted order once the recursion ends.

Assuming there's more than one element, then merge sort runs as follows:

  1. Calculate array bounds for two halves. The first half is [0, n /2 - 1], and the second half is the rest.
  2. Recursively sort the two halves.
  3. Merge two sorted halves as recursive calls return.

The merge happens by looking at the two sorted arrays and copying the smallest value to the next spot in the temporary array.

  1. Allocate the temp array and set copied, copied1, and copied2 to zero.
  2. Copy items from both halves to temp, while preserving order.
  3. Copy any remaining items from left or right subarray.
  4. Copy the elements from temp back to data.
while (both halves of the array have more elements to copy)
    if (the next element of 1st half is <= the next element of the 2nd half)
        Copy the next element of 1st half to the next spot in temp.
        Add 1 to both copied and copied1.
    else
        Copy the next element of 2nd half to the next spot in temp.
        Add 1 to both copied and copied2.

Run-time Analysis

The number of elements to be sorted is n.

Since we keep dividing the size of the subarrays, there are log2 n levels of recursive calls to mergesort. The total work done in merge at each step is O(n)

So, we have a run-time of O(n log n).

While the run-time is good, merge sort does require extra storage for temporary arrays.

See figure 12.4, p. 624 for comparison with quadratic run-time.

Quicksort

Similar to merge sort, but the dividing of the array into two subarrays is more complicated. This is a trade-off to make the combining, or merging, of sorted subarrays easier.

In quicksort, we select a pivot element and then partition the array into two halves. The left side contains all values less than or equal to the pivot and the right side holds all values greater than the pivot. The pivot itself goes in its appropriate location in the array to maintain the partitioning.

def quicksort(array)
    var list less, greater
    if length(array) ≤ 1
        return array
    pivot, less, greater = partition(array)
    return concatenate(quicksort(less), pivot, quicksort(greater))

The partition method provides the pivot and puts values of the array in the corresponding less or greater partitions which are defined by the value of the pivot. A simple selection algorithm for the pivot is to choose the first element in the array.

Partitioning occurs by scanning the less group for a value greater than the pivot, starting at the beginning of that partition, just after where the pivot is stowed. Then we begin to scan the greater partition starting at its end for a value that is lesser than the pivot. Once we have found such a value we swap the to elements which are in the incorrect sides. Once the indexes we use to scan the partitions have crossed, we are done and know that all lesser values are on the less side and all greater values are on the greater side. Finally, we swap the pivot element with the lesser element that is on the border of the partition.

See p. 628-629 for example.

Run-time Analysis

Again, there are log2 n levels of recursive calls. The total work of the partitioning done at each level is O(n).

The running time is O(n log n)

Sometimes quicksort has a much worse run-time. This happens when an array is already sorted since the pivot element is the smallest value in the array and therefore does a horrible job of dividing the array into nearly equal parts. Then recursively, quicksort must process n + (n-1) + (n-2) + … + 1 elements. In big-O terms, this is O(/n/2).

Choosing a good pivot value is important in quicksort. There are many schemes, one of which is to select three values from the array and choose the median element as the pivot.

Heapsort

Merge sort has O(n log n) in all cases, but requires extra storage. Quicksort doesn't need extra storage but it can have O(n2) in its worst-case. Heapsort has the best of both worlds! No extra storage required and no O(n2) worst-case run-time.

The heapsort works by turning the input array into a heap. The heap is ordered so that largest values are at the top.

Since we can easily find the next largest item since the largest element will always be the root of the heap. We can then swap the root with the element at the end of our unsorted array and the n-i th largest element is now in sorted order.

Reheapification Downward

As we swap the root of the heap, we'll need to reheapify by pushing the out-of-place element down the heap and into its proper position.

current = 0
heapOkay = false

while ((!heapOkay) && [the current node is not a leaf node])
    bigChildIndex = [index of larger child of current node]

    if (data[current] < data[bigChildIndex])
        [Swap data[current] with data[bigChildIndex]]
        current = bigChildIndex
    else
        heapOkay = true

Run-time Analysis

O(n log n)

The number of items added to the heap * The number of ops for one reheapify upward + The number of items to pull out of the heap * The number of ops for one reheapify downward

See p. 642-643 for breakdown.

HTML generated by org-mode 6.35i in emacs 23