CSC 321

We've talked more about dynamic programming, including edit distance (Section 9.5) and fast modular exponentiation via repeated squaring (see the wikipedia entry for various implementations and variants). We also saw the Floyd Warshall algorithm for all-pairs shortest path (Section 8.5), and started talking about greedy and dynamic solutions to coin changing.

Next, we will discuss various text searching algorithms, including naive text-searching (9.1), and Rabin-Karp Search (9.2). Knuth-Morris-Pratt (9.3), and Boyer-Moore-Horspool (9.4). After that, we'll move on to NP-completeness (Section 10.2).

Submission

1. (Reading Assignment) Read Sections 9.5, 8.5.

2. (Best Approximate Match) In this problem you will implemen a variant of the edit distance algorithm, namely the best approximate match. As explained below (and on pages 402-404 in the book), the two differ only in the initialization. The rest of the dynamic program (and algorithm) is the same.

- [15pt] Implement the Best Approximate Match algorithm (Algorithm
9.5.10 in the book) as a function bap(p,t). It calculates the smallest
edit distance between p and
*any*substring of t. The calculation is the same as for the edit distance problem (compare formula on page 400 for edit distance to formula on page 403 for best approximate match), the only difference is in the initialization phase. Test-run this with several small examples (e.g. Example 9.5.12 in the book [there is one incorrect entry in that table, see whether you can find it]). You do not need to find the best approximate match, just its distance. - [5pt] You recall a phrase 'I wandered the streets alone and
happy' from your
favorite Mark Twain book, but you don't recall which one your favorite
Mark Twain book is. It could have been any of the following four:
Innocents
Abroad,
Mysterious
Stranger,
Roughing
It, or
Tramp Abroad. Use your implementation for Best Approximate Match to
find out which book it is (calculate, and show, the distances of the phrase to each
of the four books using your algorithm in a). To save space you may want to consider not creating
the whole table, but just two lines.
*Hint*: my program takes about 2-3 minutes to run through all four texts. - [Extra Credit, 6pt] Extend your algorithm so it finds the exact phrase that you misremembered.

*Hint*: you may want to start with implementing the edit distance
algorithm (Algorithm 9.5.7) and test that with some examples before moving
on to Algorithm 9.5.10 if you think you'll benefit from additional examples
and testing.

3. (Transitive Closure, 15pt) Given a directed graph G = (V,E), the transitive closure tc(G) is a graph on the same vertex set V as G, and an edge (u,v) if there is a directed path from u to v in G. For example, the transitive closure tc(({1,2,3}, {(1,2),(2,3)})) is ({1,2,3}, {(1,2),(1,3), (2,3)}).

a) [3pt] What is the transitive closure of a directed path of length n, i.e. if you have vertices V = {1, ..., n} and edges E = {(1,2), (2,3), (3,4), ..., (n-1, n)} ?

b) [2pt] What is the transitive closure of a *star graph*, that
is, one special (center) vertex 1 with edges to all other vertices, that is,
E = {(1,2), (1,3), ..., (1,n)}, and no other edges ? *Hint*: draw it,
with 1 in the center to see why it's called a star graph.

b) [7pt] We know that we can compute the transitive closure tc(G) of a
graph using Floyd-Warshall, but that takes time O(|V|^{3}).
Suppose we know that G is a dag (directed acyclic graph). Sketch (precise
description, or pseudocode) the fastest algorithm for calculating the
transitive closure tc(G) of the dag G.

c) [3pt] What is the running time of your algoritm in b)? *Hint*:
be careful.

Marcus Schaefer

Last updated: May 20th, 2015.