Abstracts
How to model the Web? |
Gopal Pandurangan (Purdue University) |
There has been considerable work in developing
increasingly sophisticated models of the structure of the Web; the
primary applications of such modeling include understanding the
evolution of the Web, better tools for optimizing Web-scale algorithms,
mining communities and other structures on the Web, and studying the
behavior of content creators on the Web. Prior modeling has dwelt on
fitting models to the observed degree distribution of the Web. (One of
the remarkable properties about the Web graph is that the degree
distribution follows a power law with a fixed exponent, regardless of
the size of the graph.) However, modeling just the degree distribution
is relying not only on single set of parameters, but also on a
"local" property of the Web.
We develop a new approach by studying the PageRank (a
"global" parameter, used in the Google search engine to rank
pages) distribution of the Web; we show that it follows a power law
remarkably similar to the in-degree distribution. We then develop
detailed models to explain this observation, and moreover remain
faithful to the observed degree distributions. We analyze these models,
and compare the analysis to both snapshots from the Web and to graphs
generated by simulations on the new models. To our knowledge, this is
the first model that goes beyond fitting degree distributions; further
the study of the correlations between the PageRank and the in-degree
values of both the Web snapshots and our models can shed new insight
into the evolution of the Web. We hope this will open up avenues for
further research in understanding the Web structure. |
Counting Lattice Vectors |
Denis Charles (University of Wisconsin, Madison) |
We consider the problem of counting the number of
lattice vectors of a given length. We show that the problem is #
P-complete. We also give a deterministic algorithm for counting lattice
vectors of length d in time 2O(ns + \log d),
where n is the dimension of the lattice, s is the total
number of bits to encode the basis of the lattice. The algorithm is
based on the theory of modular forms. |
Efficient Data Mappings for Parity-Declustered Data Layouts |
Eric Schwabe (DePaul University) |
The joint demands of high performance and fault
tolerance in large disk arrays can be satisfied through the use of
parity-declustered data layouts -- arrangements of data and redundant
information that allow the rapid reconstruction of lost data while the
arrays continue to operate.
The data mapping problem for a data layout is the
problem of translating a data address in a linear address space (the
file system's view) into a disk identifier and an offset on the disk
where the corresponding data is stored. This problem is usually solved
using lookups in tables that explicitly store a layout's structure, but
recent work has yielded mappings that compute disk and offsets directly
from data addresses without using such tables.
In this talk, we will discuss how data layouts
based on commutative rings yield mappings with better computational
efficiency than previous layouts that do not use table lookup. These
ring-based layouts are also more suitable for use in large disk arrays.
(Joint work with Ian Sutherland, Oracle Mobile.)
|
Listing the Minimal Vertex Separators of a Graph Efficiently |
Ken Takata (University of Illinois at Chicago) |
A vertex separator is a set of vertices whose
removal from a connected graph will disconnect it. (An a-b
separator specifically disconnects two vertices a and b.)
Since a graph may have an exponential number of minimal a-b
separators, we cannot hope to list all such objects in polynomial time.
However, we can ask whether it is possible to list all minimal a-b
separators using at most a polynomial amount of time per object (that is
polynomial in the number of vertices). We can also ask for a stricter
standard in which we have at most a polynomial delay between outputting
one object and another.
In addition to time complexity, we can ask about space complexity.
i.e. Can we list the objects using polynomial space even when the number
of objects may be exponential?
These are the basic sort of questions which arise for listing
problems. In the talk, we will discuss algorithms to list minimal a-b
separators efficiently.
Time permitting, we will mention some other listing problems as well. |
Critical Transmission Range for Connectivity
with Bernoulli Nodes |
Chih-Wei Yi (Illinois Institute of Technology) |
Select n Bernoulli nodes with transmission radius rn
randomly on the unit area disc. A Bernoulli node is either active or inactive, independently
with probability, p and 1-p
respectively. There exists an edge between two nodes, if
and only if both nodes are active and their distance is less than rn. H(n,rn,p)
denotes the graph induced by the active nodes. We show that the probability that H(n,rn,p)
is connected is exp(-pe-a), where pi rn2 = (ln n + a)/(np).
We also show that the graphs H(n,rn,p) which do not contain isolated vertices
are a.s. connected as n approaches infinity.
This is joint work with Peng-Jun Wan. |
Reducing girth requirements for sampling graph colorings |
Thomas Hayes (University of Chicago) |
Fix a graph G and a parameter k.
We would like to sample legal k-colorings of
G almost uniformly at random. We will focus
on the case when the number of colors, k, is
between one and two times the maximum degree D.
The following simple Markov-chain Monte Carlo algorithm,
known as "Glauber Dynamics", has been conjectured to
perform efficiently in this range.
Glauber Dynamics:  
Given the coloring at time t, select a
random vertex, v, and a random color not assigned to
any neighbors of v. Define the coloring at time
t+1 to be the coloring at time t, except
using the new color for v.
It is easy to check that, when k > D+1, this
defines an ergodic Markov chain, whose stationary
distribution
is the uniform distribution on all legal k-colorings.
The main problem is to estimate the "mixing time"--how many
steps do we need to run the chain to sample nearly
uniformly?
It has been conjectured that, for every k > D+1,
O(n log n) steps suffice. However, we are very far
from a proof. M. Jerrum
[1]
showed that, for
k > 2D, rapid mixing occurs.
E. Vigoda [2]
improved this to k > 11D/6.
More recently, M. Dyer and A. Frieze
[3]
proved that,
under the additional assumptions D = Omega(log n) and
girth(G) = Omega(log n), rapid mixing occurs
down to k > 1.763 D. M. Molloy [4]
proved
rapid mixing for k > 1.489 D under the same
assumptions.
This was the state of the art this spring.
We relax the girth requirement for the Dyer and Frieze
approach,
proving:
Theorem: assuming D = Omega(log n) and
either (girth(G) > 4 and k > 1.763
D) or
(girth(G) > 5 and k > 1.489 D), then
the mixing time for Glauber dynamics is O(n log n).
The proof hinges on answering the following question:
Suppose we run the dynamics with the vertices
in a subset U of V covered up.
Although we can't see the colors on U,
we might be able to deduce a lot about them
by watching the rest of the graph. What can
we say about the distribution, conditioned
on all this extra information?
A preliminary draft is available from the
author. |
Dimension and the
Inapproximability of NP Optimization Problems. |
John Hitchcock (Iowa State University) |
The inapproximability of the MAX3SAT and MAXCLIQUE
optimization problems has been well-studied: MAX3SAT cannot be
approximated within a ratio of (8/7)-epsilon if P != NP (Hastad, 1997),
and MAXCLIQUE cannot be approximated in polynomial-time within a ratio
of n1-epsilon if NP != ZPP (Hastad, 1999).
We use plausible hypotheses involving resource-bounded dimension
(Lutz, 2000) and pseudorandom generators to investigate the frequency
with which approximation algorithms cannot achieve these ratios. Our
results include the following for any epsilon > 0.
(1) If NP has positive p-dimension, then for some delta > 0, any 2ndelta-time
approximation algorithm for MAX3SAT fails to approximate within
(8/7)-epsilon on an exponentially dense set of instances.
(2) If NP has positive p-dimension and strong pseudorandom generators
exist, then for some delta > 0, any 2ndelta-time
approximation algorithm for MAXCLIQUE fails to approximate within
n^{1-epsilon} on an exponentially dense set of instances.
As a corollary, (1) solves one of Lutz and Mayordomo's "Twelve
Problems in Resource-Bounded Measure" (1999). |
Time-Space Tradeoff In Derandomizing Probabilistic Logspace |
Venkat Chakaravarthy (University of Wisconsin,
Madison) |
Nisan showed that any randomized logarithmic space algorithm (running
in polynomial time and with two-sided error) can be simulated by a
deterministic algorithm that runs simultaneously in polynomial time and
O(log2 n) space. Subsequently Saks and Zhou improved the
space complexity and showed that the deterministic simulation can be
carried out in space O(log1.5 n). However, their simulation
runs in time nO(log0.5 n). We prove a
time-space tradeoff that interpolates these two simulations.
Specifically, we prove that, for any 0<= c <= 0.5, any randomized
logarithmic space algorithm (running in polynomial time and with
two-sided error) can be simulated deterministically in time nO(log0.5-c n) and space
O(log1.5+c n). That is, we prove
that BPL is contained in DTISP[ nO(log0.5-c n) , O(log1.5+c n).
Joint work with Jin-Yi Cai and Dieter van Melkebeek. |
SPT optimally approximates uniprocessor flow |
David Bunde (University of Illinois at
Urbana-Champaign) |
We show that the Shortest Processing Time (SPT) algorithm
is (Δ+1)/2-competitive for nonpreemptive uniprocessor
total flow time, where Δ is the ratio between the longest and
shortest job lengths.
This is best possible for a deterministic algorithm and improves on
the Δ+1 approximation ratio shown by Epstein
and van Stee using different methods.
We also show a structural similarity between the Shortest
Remaining Processing Time (SRPT) and SPT schedules. |
Linear-Time Algorithms for Computing
Maximum-Density Sequence Segments with Bioinformatics Applications |
Michael Goldwasser (Loyola University) |
We study an abstract optimization problem arising
from biomolecular sequence analysis. One motivation is
the desire to identify GC-rich segments of DNA sequences.
For a sequence A =
{a1, a2, ..., an}
of real numbers, a segment A(i,j) is a consecutive
subsequence of A starting with index i and ending with
index j.
The maximum-density segment problem takes A and two
values L and U as input and asks for a segment of
A with the largest possible density among those of width at
least L and at most U.
We provide a relatively simple, O(n)-time algorithm for this
problem. This improves upon the O(n log L)-time
algorithm by Lin, Jiang and Chao, for the case where U is
unbounded, and upon a trivial O(n(U-L+1))-time
algorithm when both L and U are specified.
This is joint work with
Ming-Yang Kao
and
Hsueh-I Lu.
|
Conditionalization of interval
Probabilities: The Good, The Bad and The Ugly |
Alex Dekhtyar (University of Kentucky) |
Conditionalization, i.e., computation of a
conditional probability distribution given a joint probability
distribution of two or more random variables is an important operation
in probability theory. While the computation of the conditional
probability distribution is straightforward when the exact point
probabilities are involved, it is often the case that such exact point
probability distributions of random variables are not known, rather,
interval probabilities are provided.
The good: In this talk we discuss a possible worlds semantics for
interval probability distributions, define the operation of
conditionalization on such distributions, and provide closed-form
solution to the problem of computing the result of conditionalization
explicitely.
The bad: While our solution is provably correct, w.r.t. the given
definition, the conditionalization results show a variety of peculiar
properties, such as non-commutativity.
The ugly: As it turns out, it is impossible to define a better notion
of conditionalization operation on intervals. We discuss this
interesting result (due to Jaffray) and its implications on
representaion of imprecise probabilities.
This is joint work with Judy Goldsmith (University of Kentucky).
|
Cellular and Graph Automata in The Context of
Parallelism vs. Sequential Nondeterminism |
Predrag Tosic (University of Illinois at
Urbana-Champaign) |
It
is common in theory of parallel computation to distinguish between
"coarse-grained parallelism" (practically all computing
machines we build are of this type) and "fine-grained
parallelism" (that are built by Nature). Many problems central to
the former class (physical and logical limits on parallelism, synchrony
vs. asynchrony, finite speed of information propagation) are typically
abstracted away altogether from the latter. For instance, in case of
classical cellular automata (CA), it is assumed that a single cell
'reads' values of its neighbors and updates its current value in a
single 'atomic' (i.e., indivisible) step, and in a perfect synchrony
with all other cells that do the same. But, should these "local
node updates" always be considered atomic?
We consider some examples of classes of cellular and graph automata
where 'parallel' node updates yield global behavior that provably cannot
be reproduced by any sequential ordering of the node updates. The main
implication is that, in the context of "massively parallel
computers" such as CA, either we have to abandon the notion of
behavioral equivalence of parallel actions and nondeterministic
interleavings of appropriate sequential actions, or else refine the
granularity of atomic actions even further in order to preserve the
nondeterministic interleaving semantics of concurrency. |
|