Midwest Theory Day
D e P a u l   U n i v e r s i t y
A u t u m n   2 0 0 2


Abstracts

How to model the Web?
Gopal Pandurangan (Purdue University)
There has been considerable work in developing increasingly sophisticated models of the structure of the Web; the primary applications of such modeling include understanding the evolution of the Web, better tools for optimizing Web-scale algorithms, mining communities and other structures on the Web, and studying the behavior of content creators on the Web. Prior modeling has dwelt on fitting models to the observed degree distribution of the Web. (One of the remarkable properties about the Web graph is that the degree distribution follows a power law with a fixed exponent, regardless of the size of the graph.) However, modeling just the degree distribution is relying not only on single set of parameters, but also on a "local" property of the Web.

We develop a new approach by studying the PageRank (a "global" parameter, used in the Google search engine to rank pages) distribution of the Web; we show that it follows a power law remarkably similar to the in-degree distribution. We then develop detailed models to explain this observation, and moreover remain faithful to the observed degree distributions. We analyze these models, and compare the analysis to both snapshots from the Web and to graphs generated by simulations on the new models. To our knowledge, this is the first model that goes beyond fitting degree distributions; further the study of the correlations between the PageRank and the in-degree values of both the Web snapshots and our models can shed new insight into the evolution of the Web. We hope this will open up avenues for further research in understanding the Web structure.


Counting Lattice Vectors
Denis Charles (University of Wisconsin, Madison)
We consider the problem of counting the number of lattice vectors of a given length. We show that the problem is # P-complete. We also give a deterministic algorithm for counting lattice vectors of length d in time 2O(ns + \log d), where n is the dimension of the lattice, s is the total number of bits to encode the basis of the lattice. The algorithm is based on the theory of modular forms.

Efficient Data Mappings for Parity-Declustered Data Layouts
Eric Schwabe (DePaul University)

The joint demands of high performance and fault tolerance in large disk arrays can be satisfied through the use of parity-declustered data layouts -- arrangements of data and redundant information that allow the rapid reconstruction of lost data while the arrays continue to operate.

The data mapping problem for a data layout is the problem of translating a data address in a linear address space (the file system's view) into a disk identifier and an offset on the disk where the corresponding data is stored. This problem is usually solved using lookups in tables that explicitly store a layout's structure, but recent work has yielded mappings that compute disk and offsets directly from data addresses without using such tables.

In this talk, we will discuss how data layouts based on commutative rings yield mappings with better computational efficiency than previous layouts that do not use table lookup. These ring-based layouts are also more suitable for use in large disk arrays.

(Joint work with Ian Sutherland, Oracle Mobile.)


Listing the Minimal Vertex Separators of a Graph Efficiently
Ken Takata (University of Illinois at Chicago)
A vertex separator is a set of vertices whose removal from a connected graph will disconnect it. (An a-b separator specifically disconnects two vertices a and b.)

Since a graph may have an exponential number of minimal a-b separators, we cannot hope to list all such objects in polynomial time. However, we can ask whether it is possible to list all minimal a-b separators using at most a polynomial amount of time per object (that is polynomial in the number of vertices). We can also ask for a stricter standard in which we have at most a polynomial delay between outputting one object and another.

In addition to time complexity, we can ask about space complexity. i.e. Can we list the objects using polynomial space even when the number of objects may be exponential?

These are the basic sort of questions which arise for listing problems. In the talk, we will discuss algorithms to list minimal a-b separators efficiently.

Time permitting, we will mention some other listing problems as well.


Critical Transmission Range for Connectivity with Bernoulli Nodes
Chih-Wei Yi (Illinois Institute of Technology)
Select n Bernoulli nodes with transmission radius rn randomly on the unit area disc. A Bernoulli node is either active or inactive, independently with probability, p and 1-p respectively. There exists an edge between two nodes, if and only if both nodes are active and their distance is less than rn. H(n,rn,p) denotes the graph induced by the active nodes. We show that the probability that H(n,rn,p) is connected is exp(-pe-a), where pi rn2 = (ln n + a)/(np). We also show that the graphs H(n,rn,p) which do not contain isolated vertices are a.s. connected as n approaches infinity. This is joint work with Peng-Jun Wan.

Reducing girth requirements for sampling graph colorings
Thomas Hayes (University of Chicago)

Fix a graph G and a parameter k. We would like to sample legal k-colorings of G almost uniformly at random. We will focus on the case when the number of colors, k, is between one and two times the maximum degree D. The following simple Markov-chain Monte Carlo algorithm, known as "Glauber Dynamics", has been conjectured to perform efficiently in this range.

Glauber Dynamics:   Given the coloring at time t, select a random vertex, v, and a random color not assigned to any neighbors of v. Define the coloring at time t+1 to be the coloring at time t, except using the new color for v.

It is easy to check that, when k > D+1, this defines an ergodic Markov chain, whose stationary distribution is the uniform distribution on all legal k-colorings. The main problem is to estimate the "mixing time"--how many steps do we need to run the chain to sample nearly uniformly?

It has been conjectured that, for every k > D+1, O(n log n) steps suffice. However, we are very far from a proof. M. Jerrum [1] showed that, for k > 2D, rapid mixing occurs. E. Vigoda [2] improved this to k > 11D/6.

More recently, M. Dyer and A. Frieze [3] proved that, under the additional assumptions D = Omega(log n) and girth(G) = Omega(log n), rapid mixing occurs down to k > 1.763 D. M. Molloy [4] proved rapid mixing for k > 1.489 D under the same assumptions. This was the state of the art this spring.

We relax the girth requirement for the Dyer and Frieze approach, proving:

Theorem: assuming D = Omega(log n) and either (girth(G) > 4 and k > 1.763 D) or (girth(G) > 5 and k > 1.489 D), then the mixing time for Glauber dynamics is O(n log n).

The proof hinges on answering the following question: Suppose we run the dynamics with the vertices in a subset U of V covered up. Although we can't see the colors on U, we might be able to deduce a lot about them by watching the rest of the graph. What can we say about the distribution, conditioned on all this extra information?

A preliminary draft is available from the author.


Dimension and the Inapproximability of NP Optimization Problems.
John Hitchcock (Iowa State University)

The inapproximability of the MAX3SAT and MAXCLIQUE optimization problems has been well-studied: MAX3SAT cannot be approximated within a ratio of (8/7)-epsilon if P != NP (Hastad, 1997), and MAXCLIQUE cannot be approximated in polynomial-time within a ratio of n1-epsilon if NP != ZPP (Hastad, 1999).

We use plausible hypotheses involving resource-bounded dimension (Lutz, 2000) and pseudorandom generators to investigate the frequency with which approximation algorithms cannot achieve these ratios. Our results include the following for any epsilon > 0.

(1) If NP has positive p-dimension, then for some delta > 0, any 2ndelta-time approximation algorithm for MAX3SAT fails to approximate within (8/7)-epsilon on an exponentially dense set of instances.

(2) If NP has positive p-dimension and strong pseudorandom generators exist, then for some delta > 0, any 2ndelta-time approximation algorithm for MAXCLIQUE fails to approximate within n^{1-epsilon} on an exponentially dense set of instances.

As a corollary, (1) solves one of Lutz and Mayordomo's "Twelve Problems in Resource-Bounded Measure" (1999).


Time-Space Tradeoff In Derandomizing Probabilistic Logspace
Venkat Chakaravarthy (University of Wisconsin, Madison)
Nisan showed that any randomized logarithmic space algorithm (running in polynomial time and with two-sided error) can be simulated by a deterministic algorithm that runs simultaneously in polynomial time and O(log2 n) space. Subsequently Saks and Zhou improved the space complexity and showed that the deterministic simulation can be carried out in space O(log1.5 n). However, their simulation runs in time nO(log0.5 n). We prove a time-space tradeoff that interpolates these two simulations. Specifically, we prove that, for any 0<= c <= 0.5, any randomized logarithmic space algorithm (running in polynomial time and with two-sided error) can be simulated deterministically in time nO(log0.5-c n) and space O(log1.5+c n). That is, we prove that BPL is contained in DTISP[ nO(log0.5-c n) , O(log1.5+c n).

Joint work with Jin-Yi Cai and Dieter van Melkebeek.


SPT optimally approximates uniprocessor flow
David Bunde  (University of Illinois at Urbana-Champaign)

We show that the Shortest Processing Time (SPT) algorithm is (Δ+1)/2-competitive for nonpreemptive uniprocessor total flow time, where Δ is the ratio between the longest and shortest job lengths. This is best possible for a deterministic algorithm and improves on the Δ+1 approximation ratio shown by Epstein and van Stee using different methods. We also show a structural similarity between the Shortest Remaining Processing Time (SRPT) and SPT schedules.


Linear-Time Algorithms for Computing Maximum-Density Sequence Segments with Bioinformatics Applications
Michael Goldwasser (Loyola University)
We study an abstract optimization problem arising from biomolecular sequence analysis. One motivation is the desire to identify GC-rich segments of DNA sequences. For a sequence A = {a1, a2, ..., an} of real numbers, a segment A(i,j) is a consecutive subsequence of A starting with index i and ending with index j. The maximum-density segment problem takes A and two values L and U as input and asks for a segment of A with the largest possible density among those of width at least L and at most U. We provide a relatively simple, O(n)-time algorithm for this problem. This improves upon the O(n log L)-time algorithm by Lin, Jiang and Chao, for the case where U is unbounded, and upon a trivial O(n(U-L+1))-time algorithm when both L and U are specified. This is joint work with Ming-Yang Kao and Hsueh-I Lu.

Conditionalization of interval Probabilities: The Good, The Bad and The Ugly
Alex Dekhtyar (University of Kentucky)

Conditionalization, i.e., computation of a conditional probability distribution given a joint probability distribution of two or more random variables is an important operation in probability theory. While the computation of the conditional probability distribution is straightforward when the exact point probabilities are involved, it is often the case that such exact point probability distributions of random variables are not known, rather, interval probabilities are provided.

The good: In this talk we discuss a possible worlds semantics for interval probability distributions, define the operation of conditionalization on such distributions, and provide closed-form solution to the problem of computing the result of conditionalization explicitely.

The bad: While our solution is provably correct, w.r.t. the given definition, the conditionalization results show a variety of peculiar properties, such as non-commutativity.

The ugly: As it turns out, it is impossible to define a better notion of conditionalization operation on intervals. We discuss this interesting result (due to Jaffray) and its implications on representaion of imprecise probabilities.

This is joint work with Judy Goldsmith (University of Kentucky).

 
Cellular and Graph Automata in The Context of Parallelism vs. Sequential Nondeterminism
Predrag Tosic (University of Illinois at Urbana-Champaign)

It is common in theory of parallel computation to distinguish between "coarse-grained parallelism" (practically all computing machines we build are of this type) and "fine-grained parallelism" (that are built by Nature). Many problems central to the former class (physical and logical limits on parallelism, synchrony vs. asynchrony, finite speed of information propagation) are typically abstracted away altogether from the latter. For instance, in case of classical cellular automata (CA), it is assumed that a single cell 'reads' values of its neighbors and updates its current value in a single 'atomic' (i.e., indivisible) step, and in a perfect synchrony  with all other cells that do the same. But, should these "local node updates" always be considered atomic?

We consider some examples of classes of cellular and graph automata where 'parallel' node updates yield global behavior that provably cannot be reproduced by any sequential ordering of the node updates. The main implication is that, in the context of "massively parallel computers" such as CA, either we have to abandon the notion of behavioral equivalence of parallel actions and nondeterministic interleavings of appropriate sequential actions, or else refine the granularity of atomic actions even further in order to preserve the nondeterministic interleaving semantics of concurrency.