===========================================================================
CSC 363H                Lecture Summary for Week 13             Summer 2006
===========================================================================

Log space (continued):

 -  Example 2:
        PATH = { <G,s,t> : G is a graph that contains a path from s to t}
    No known deterministic log-space algorithms, but easy nondeterministic
    log-space algorithm: store index of current vertex, start at s and
    nondeterministically select next vertex, accepting when t is reached;
    keep a counter of how many vertices we've visited, rejecting if visited
    more than n=|V| vertices.
    This only requires room to store one vertex index, O(log n), and one
    counter from 1 to n, O(log n).
    Corectness: there is some computation path that accepts iff there is
    some path from s to t in G.

 -  L subset of NL, but NL subset of L unknown: Savitch's Theorem shows
    NL subset of SPACE((log n)^2), but that's all.

 -  What about NL and P? Let's study NL-completeness.

Defn:  Language A is "logspace reducible" to language B (written A <=L B)
if there is a function f : Sigma* -> Sigma* computable using a 3-tape TM
with a read-only input tape, a write-only output tape and a read/write
work tape such that for all w in Sigma*, w in A iff f(w) in B and
the TM uses only O(log |w|) cells on the work tape.

 -  PATH is NL-complete (w.r.t. L):
     .  PATH in NL
     .  for all A in NL, A <=L PATH, using "log space reduction" <=L
    Idea:  The question "does w belong to A" is equivalent to "is there a
    path from the initial configuration to an accepting configuration in
    the computation tree of the nondeterministic log space TM for A".

 -  Note:  If A in L and B <=L A, then B in L.  However, must be careful:
    output of log space reduction could take up more than log space.  To
    get a log space algorithm for B, must use log space algorithm for A and
    recompute log space reduction each time, keeping only one output symbol
    at a time on work tape.

 -  Since PATH in P, and A <=L B implies A <=p B (TM running in space
    O(log n) has at most n * 2^O(log n) possible configurations = O(n^k)
    for some constant k), NL subset of P.

 -  L = NL?  Unknown!  P = NL?  Unknown!
    However:  NL = coNL!  NL =/= PSPACE!

Back to P vs. NP:

 -  If P =/= NP, then there are problems in NP that are neither in P nor
    NPc, and there are infinitely many intermediate classes between P and
    NPc, with complexity that gets larger and larger.  For example, the
    following language:

        GRAPH-ISOMORPHISM = { <G,H> | G and H are two graphs that are
        isomorphic, i.e., there is a one-to-one and unto function f that
        maps vertices of G to vertices of H such that all corresponding
        edges are the same -- (u,v) is an edge of G iff (f(u),f(v)) is an
        edge of H }

    This is clearly in NP (a certificate is the function f, described as a
    list of pairs), but it is not known (or believed) to be in P, and it is
    not known (or believed) to be NP-complete.

-----------------------------
Provably intractable problems
-----------------------------

We've concentrated on P and NP because they are both defined as polytime
and our interest was in efficient computation.  More importantly, a vast
majority of "real-life" problems that arise naturally from various
application domains belong to NP.

We have seen how to prove problems are NP-complete and why this is evidence
that problems have no efficient solution.  But are there problems that can
be proved to have no efficient solution unconditionally?

Definitions:
    EXP = U TIME(2^{n^k})
        = { languages decided by TMs in time O(2^n^k) }
    NEXP = U NTIME(2^{n^k})
         = { languages decided by TMs in nondeterministic time O(2^n^k) }
    EXPSPACE = U SPACE(2^{n^k})
             = { languages decided by TMs in space O(2^n^k) }

By Savitch's Theorem, NEXPSPACE = EXPSPACE.

Known: L <= NL <= P <= NP <= PSPACE <= EXP <= NEXP <= EXPSPACE
(using "<=" to represent "subset of")

Unknown: L ?= NL, NL ?= P, P ?= NP, P ?= PSPACE,
         EXP ?= NEXP, EXP ?= EXPSPACE
In other words, we don't know how to prove that nondeterminism makes a
difference, and we don't know how to prove that space is more powerful than
time.

Known: NL != PSPACE, P != EXP, NP != NEXP, PSPACE != EXPSPACE
In other words, we can prove that exponential gaps make a difference.

Problems that are complete for EXP are known to be not in P, e.g.,
"generalized chess", "generalized checkers".  Similarly, problems that are
complete for EXPSPACE require more than polynomial space and are highly
intractable, e.g., "inequivalence of regular expressions with squaring",
"equivalence of regular expressions with exponentiation".

How do we know these results?  Because of so-called hierarchy theorems:
for all real constants 0 <= c1 < c2, TIME(n^c1) is a proper subset of
TIME(n^c2) and the same with SPACE.  (See section 9.1 in the textbook.)

But it doesn't stop there!  We can define k-EXP, k-NEXP, k-EXPSPACE
(deterministic time, nondeterministic time, space) 2^(2^...(2^n^t)...),
where there are k exponentiations (so EXP = 1-EXP).  Then, ELEMENTARY =
1-EXP U 2-EXP U ... = 1-EXPSPACE U 2-EXPSPACE U ...  (since k-EXPSPACE is a
subset of (k+1)-EXP).  Problems complete for ELEMENTARY are decidable, but
with such astronomical time or space bounds that they are completely
intractable.  Yet, "inequivalence of regular expressions with union,
concatenation, and negation" requires running time 2^(2^(...2^(2^n)...))
where there are at least log n many exponentiations, so it's outside even
ELEMENTARY!

----------------------------
Dealing with NP-completeness
----------------------------

NP important because it contains huge number of real-life problems that
arise in various application domains.  Vast majority of these problems
belong either in P or are NP-complete.

We know NP-complete problems do not have efficient solutions (unless P=NP)
but this doesn't make real-life application go away.  Example: VLSI circuit
layout problem is NP-hard, but that doesn't mean we can forget about it; it
must still be solved somehow!

NP-complete means there is no exact, efficient algorithm.

  - Heuristics: compromise on efficiency -- some problems have algorithms
    that run in exponential time in worst-case but where worst-case does
    not seem to happen often in practice.

  - Approximation: compromise on exactness -- find efficient algorithm that
    may not return exact answer but something "close", e.g., instead of
    finding a k-clique, maybe will find a (k/2)-clique or k vertices that
    are "almost" a clique.

 -  NP-completeness based on worst-case analysis: in practical
    applications, worst-case may not come up often (if at all).
    Average-case performance may be more indicative (but much harder to
    compute properly).

 -  Alternatively, sometimes possible to work with restricted classes of
    inputs.  For example, 2SAT is in P, UNARY-SUBSET-SUM is in P (and so is
    the problem if all input integers have a value bounded by some
    polynomial function of the input size), etc.

In some cases, possible to prove performance results, e.g., "greedy by
degree" graph colouring algorithm can be shown to produce an optimal
colouring for all "co-graphs" -- graphs that have a certain property.  For
graphs that don't have this property, there is a gradual loss of optimality
(i.e., graphs that are "close" to having the property will be coloured
using "close" to the smallest number of colours), instead of a sharp rise.

Heuristics are useful not just for problems in NPc, but also for problems
in P whose running time is a high-degree polynomial (meaning 4 or above).
Many practical applications deal with huge inputs (sizes 10^6 and above)
where difference between n^2 and n^4 algorithms is significant.  For
example, the "linear programming" problem asks to minimize a linear
function of some variables subject to a set of linear constraints on those
variables.  There is a polytime algorithm to solve linear programming, but
its running time is a high-degree polynomial (something like n^6).  Because
linear programming is often used to model very large systems, this is not
usable in practice; instead, the "simplex method" is often used.  This
algorithm has a worst-case running time that is exponential, but for most
inputs encountered in practice, it does much better (including much better
than the complicated polytime algorithm).

------
REVIEW
------

Main topics:

 -  Computability
     .  models of computation; robustness
     .  diagonalization; countability/uncountability
     .  decidability/recognizability; dovetailing
     .  undecidability/unrecognizability; A_TM
     .  many-one reductions (<=m); examples

 -  Complexity
     .  models of computation; P, NP, coNP
     .  polytime reductions (<=p); Cook's theorem; NP-completeness
     .  polytime self-reducibility
     .  space complexity; PSPACE, L, NL; intractable problems

 -  Reducibility (A <= B) is the central tool used.  Understand it well!

Final exam: you may bring one handwritten (not photocopied) US letter sized
"cheet" sheet (both sides) to the exam.