Homework 3 (due 4/21)
CSC 321

We talked about quicksort (Section 6.2), randomized selection (Section 6.5) and heapsort (Section 3.5). I also showed you one linear-time sort (counting sort), and mentioned another: radix sort (Section 6.4). Next week, we will start on graph theory (see Sections 2.5 and 2.6 for review), and graph algorithms.

: Submit to the d2l dropbox. Use some standard format for your homework (Word or pdf, for example), and make sure to include your name and assignment number. Include screenshots for code and testruns. You do not need to include the code (or any executables).

1. (Reading Assignment) Read Sections 3.5, 6.2, 6.5, and 6.4. Also, if you think your graph theory is a bit rusty, review Section 2.5.

2. (Warming up with Quicksort, 6pt).

  1. [3] Show that quicksort (with Lomuto partitioning, which is what we saw in class) is not stable. It'll be sufficient to give the example and show how quicksort destroys the original order of duplicates (there is a counterexample with three elements).

  2. [3] The median-of-3 heuristic uses as the pivot the median of the first, the middle and the last element of the array. What is the asymptotic running time of quicksort with this heuristic on a list that is already sorted in increasing order? (Feel free to implement and test-run this.)

3. (Randomized Selection, 10pt)

  1. [6pt] Implement the randomized selection algorithm (based on randomized quicksort, see Section 6.5; you can modify the class quick-sort implementation), and test it on some small lists.

  2. [4pt] The file GMATscores.txt contains GMAT scores, listed in the format Lastname, tab ('\t'), score, newline ('\n'). Using your algorithm from a), and not using any sorting, determine the GMAT score at the 50% percentile, the 90% percentile, and the 95% percentile. Hint: read the file as a list of lines (readlines), use the split('\t') to extract name and score. You can use eval to convert string to integer.

4. (Median, 10pt) A previous intern in your group implemented a kick-ass median algorithm which runs in O(n) time. Unfortunately, she left, and you can only find the executable of her code. So while you can run the median algorithm, you have no idea how it works. It is a black box (and you have no time to reverse-engineer it). Even more unfortunately, you need to solve the general selection problem (given list lst, number k, find kth element of sorted list). Explain how you can still use the median algorithm you have access to, to solve that problem. You do not need to implement this problem (though you can, problem 3 gives you access to a median algorithm). A lucid explanation, possibly with some pseudocode, or a well-worked, small example will do the trick. Hint: as usual, think about a small example. You may not make any assumptions on how the median algorithm works, it is a true black box.

5. (Heaps, 14pt) You are given a min-heap (in class we saw max-heaps, min-heaps are like max-heaps, but their parent values are always smaller than their children values, see Definition 3.5.4) stored in an array as a heap. You are also given a value x and you want to compute the rank of x in the heap, that is, you want to calculate how many elements in the heap are smaller than x (in a way, that's the reverse of the selection problem, where you are given the rank k and find x).

a) [5pt] Implement the straight-foward strategy: one by one extract the smallest element (and re-heapify), as long as that is less than x, and count. That approach will take time O(k log n), where n is the size of the array.

b) [9pt] Find and implement a strategy that takes time O(k) only.

To test your algorithms from a) and b), take the GMATscores.txt data and convert it into a heap (you can build on the heapsort implementation in sort4.pt which is on d2l, you need to strip out the names) and test with some values: x = 540, x = 550, etc.

Marcus Schaefer
Last updated: April 16th, 2015.