Class 6: Sorting in Linear Time

 

We have seen several algorithms that can sort n numbers in O(nlgn) time. Merge sort and heapsort achieve this upper bound in the worst case; quicksort achieves it on average. These algorithms have one common property: the sorted order they determine is based only on comparisons between the input elements. Such sorting algorithms are called comparison sorts.

 

We prove that any comparison sort must make W(nlgn) comparisons in the worst case to sort n elements. We also examine a counting sort algorithm that runs in linear time. This algorithm uses non-comparison operations to determine the sorted order.

 

Lower bounds for sorting

 

We assume without loss of generality that all input elements are distinct. In this case, we can simply make comparisons of the form ai <= aj.

 

Comparison sorts can be viewed in terms of decision tree. For example, sorting three elements using insertion sort will look as follows:

 

                       

                                                            1:2

                        <=                                                                    >

                        2:3                                                                   1:3

<=                                >

<1,2,3>                        1:3

                        <=                    >

                        <1,3,2>            <3,1,2>          ...

 

 

In a decision tree each internal node is annotated by i:j for some i and j in the range 1 <= i,j <= n. Each leaf is annotated by a permutation <p(1), ... , p(n)>. The execution of the sorting algorithm corresponds to tracing a path from the root to a leaf. Any correct sorting algorithm must be able to produce each permutation of its input, hence all n! leaves of the decision tree must be "reachable".

 

Theorem 8.1

Any comparison sort algorithm requires W(nlgn) comparisons in the worst case.

 

Proof. The length of the longest path from the root of a decision tree to any of its reachable trees represents the worst-case number of comparisons that a sorting algorithm performs. Hence, we need to determine the height of the decision tree, where each permutation is a reachable leaf.

 

Consider a tree of height h and l leaves. Since each permutation appears as a leaf, we have n! <= l. A binary tree of height h has no more than 2^h leaves:

 

n! <= l <= 2^h

 

taking logarithm:

 

h >= lg(n!)

 

/* Since lg(n!) = q(n lgn) according to (3.18) */

 

h >= q(nlgn) =>

 

h = W(nlgn)

 

Counting sort

 

This algorithm assumes that each of the n input elements is an integer in the range 0 to k. When k = O(n), the sort runs in q(n) time.

 

The basic idea is to determine for each input element x, the number of elements less than x. This information can be used to place x directly into its position in the output array. The algorithm requires an input array A, and the output array B.

 

Counting-Sort (A,B,n,k)

1     for i = 0 to k

2           C[i] = 0

3     for i = 0 to n

4           C[A[i]]++

5     // C[i] now contains the number of elements equal to i

6     for i = 1 to k

7           C[i] = C[i] + C[i-1]

8     // C[i] now contains number of elements <= i

9     for i = n to 1

10          B[C[A[i]]] = A[i]

11          C[A[i]]--

12    return

 

Example

 

 

2          1            0            2            2

 

 

0          1            2

1          1            3

 

 

0          1            2

1          2            5

 

 

1          2            3            4            5            index

                                                2

2

0

            1

                        2

 

After loop 6,the array C contains the first position of an element with value i, which is the same as the number of elements <= i. At each iteration, when the i element is placed into the output array, the position of the next element i will be before the current one.

 

To calculate the running time, observe that the number of operations is k+1(loop 1) + n+k+n = Q(k+n). When using k=O(n), we have the running time Q(n).

 

An important quality of the counting sort is that it is stable, numbers with the same value appear in the output array in the same order as they do in the input array. The property of stability is important when satellite date are carried around with the key.