In class we covered more dynamic programming; in particular we saw how to solve the longest common subsequence problem, the all-pairs shortest path problem (using the Floyd-Warshall algorithm, see Section 25.2), and the optimal binary search tree (15.5), also see optimal binary search tree example (the formula in class was incorrect: the parentheses were wrong). Next week we will finish dynamic programming (0/1 knapsack) and start talking about greedy algorithms (16.1, 16.3).
Midterm will be in sixth week, first half of class. There will be a lecture during the second half. Exam is open book, open notes. Coverage includes everything up to and including this week.
Submission: you can submit the homework to me either by hardcopy in class or email it to me.
1. [Line-breaking] You want to write an algorithm that typesets a set of words forming a paragraph. Your input is an array w[1..n] of the lengths of the words. Punctuation is considered part of the preceding word. For example,
"For example, we might have to typeset this sentence. Optimally."
we would have w[1] = 3, w[2] = 8 (including the ","), w[3] = 2, ... w[10] = 10 (including the final period). You are also given a maximum number k of symbols in a line. For k = 15, the earlier example could be written as
For example, we // 15 symbols might // 5 symbols have to typeset // 15 symbols this sentence. // 14 symbols Optimally. // 10 symbols
(with symbol count at the end). For this exercise we assume a typewriter-like font, i.e. the font is mono-spaced: every symbol has the same width. The particular typesetting shown for the example is not very good. Better would be:
For example, we // 15 symbols might have to // 13 symbols typeset this // 12 symbols sentence. // 9 symbols Optimally. // 10 symbols
Still, not very nice, but with a width of at most 15 letters, we might not be able to do much better. How do we measure the badness of a solution: we sum up the cubes of the number of letters missing to a full line (excluding the last line of the paragraph; last lines may be short); so the first example has badness 03 + 103 + 03 + 13 = 1001. The second, in comparison, had badness 03 + 23 + 33 + 63 = 251, which is much better.
a) [25pt] Write an algorithm in pseudocode (and include an explanation of your algorithm) that computes the minimal badness of a paragraph of text, given w[1..n] and k. Be careful in dealing with spaces.
b) [5pt] What is the running time of your algorithm?
c) [15pt] Implement your algorithm and modify it so it actually produces the optimal layout. Test it with one or two texts, include printouts.
2. [LCS, 10pt] Using the LCS algorithm, find the longest common subsequence of "SLWOVNNDK" and "ALWGQVNBBK". Compute the full table.
3. [Edit-Distance, 15pt] One notion of edit distance between two strings is defined as follows: given a string you can perform the following operations on it: (D)elete a letter, and (I)nsert a letter. The edit distance is the smallest number of operations that turns one word into the other word. For example,
ALGORITHM (delete 2nd letter) AGORITHM (delete 8th letter) AGORITH (delete 7th letter) AORITH (delete 2nd letter) AORTH (delete 4th letter) AORTHA (insert A after 5th letter) AORTA (delete 5th letter)
Showing that the edit distance between ALGORITHM and AORTA is at most 6. (Can you make do with less operations?) Show that with the edit distance ED(x,y) of two words x and y defined like this it is true that
n + m - 2 LCS(x,y) = ED(x,y)
where n = |x| and m = |y|. For example, ED(ALGORITHM, AORTA) = 6, while LCS(ALGORITHM, AORTA) = 4, n = 9, m = 5. So 9 + 5 - 2*4 = 6. Hint: show <= and >= separately.
4. [Floyd-Warshall, 20pt] Trace the Floyd-Warshall algorithm for the weighted digraph G = (V,E, w), where V = {1,2,3,4} and E = {12, 13, 14, 21, 24, 31, 34, 42, 43} and w(12) = 2, w(13) = 4, w(14) = 3, w(21) = 3, w(24) = 3, w(31) = 5, w(34) = -3, w(42) = -1, w(43) = 4. Edges that are not present are of infinite weight.
5. [Optimal binary search tree, 15pt] Complete the example on the handout. (Double-check that the values I computed are correct.)
6. [Extra Credit] The farthest distance problem from the last homework is open as extra credit.