===========================================================================
CSC 373H                Lecture Summary for Week 12             Winter 2006
===========================================================================

Linear programming:

  - Network flows:  Given network N=(V,E) with capacities c(e) for e in E,
    construct linear program with variables f_e for each e in E:
        maximize:    SUM_{e in E+(s)} f_e
        subject to:
            f_e <= c(e)  for all e in E
            SUM_{e in E-(v)} f_e = SUM_{e in E+(v)} f_e  for all v in V
    where E-(v) = in-edges of vertex v and E+(v) = out-edges of vertex v,
    for all v in V.  This is a direct re-statement of network flow problem.

  - Vertex Cover:
      . Problem definition:
        Input:  Undirected graph G=(V,E).
        Output:  Subset of vertices C that "covers" every edge
            (i.e., each edge has at least one endpoint in C),
            with minimum size.
      . Representing as integer program:
        use variable x_i for each vertex v_i in V
            minimize:    x_1 + x_2 + ... + x_n
            subject to:  x_i + x_j >= 1  for all (v_i,v_j) in E
                         x_i in {0,1}  for all v_i in V
        Integer program is completely equivalent to original problem,
        including NP-completeness (so no polytime algorithm).
      . Linear relaxation:
        remove restriction of x_i to integer values
            minimize:    x_1 + x_2 + ... + x_n
            subject to:  x_i + x_j >= 1  for all (v_i,v_j) in E
                         0 <= x_i <= 1  for all v_i in V
        Solution can be found in polytime, but may include fractional
        values of x_i's.
        Example: G=(V,E) where V={1,2,3}, E={(1,2),(2,3),(1,3)} becomes
            minimize:    x_1 + x_2 + x_3
            subject to:  x_1 + x_2 >= 1
                         x_2 + x_3 >= 1
                         x_1 + x_3 >= 1
                         0 <= x_1, x_2, x_3 <= 1
        with solution x_1 = x_2 = x_3 = 1/2.
      . Rounding:
        Compute optimal solution to linear program: x'_1, x'_2, ..., x'_n.
        Create cover as follows:
            for each v_i in V, put v_i in C iff x'_i >= 1/2.
        C is a cover because constraint x_i + x_j >= 1 guarantees at least
        one of x'_i, x'_j >= 1/2 for each edge (v_i,v_j).
      . 2-approximation bound:
        Consider minimum vertex cover C*.  For i = 1,...,n, let x*_i = 1 if
        v_i in C*; x*_i = 0 otherwise.  x*_1, ..., x*_n is a solution to
        linear program that satisfies all constraints with 0-1 values so
                        SUM x'_i <= SUM x*_i = |C*|
        (since x'_i is optimal solution to linear program with no
        restriction on values).
        For i = 1,...,n, let x~_i = 1 if x'_i >= 1/2; x~_i = 0 othewise.
        Then, for each i, x~_i <= 2 x'_i so
            |C| = SUM x~_i <= 2 SUM x'_i <= 2 |C*| (by equation above)
        Hence, |C| is no more than twice the size of a minimum vertex
        cover.

  - Weighted vertex cover: section 11.6 (simple generalization).

------------------------
Approximation algorithms
------------------------

Some problems are hard to solve.
  eg. NP-complete problems: no one knows how to solve in worst-case
      polynomial time.

  - Perhaps a "good" answer (near-optimal) answer found efficiently
    is good enough.

  - "approximation ratio": an algorithm has approximation ratio r(n)
    if for any input of size n,
      C_alg / C_opt <= r(n)
    where C_alg is the "cost" of the solution found by the algorithm
    and C_opt is the "cost" of the optimal solution.
      - in some cases, ratio does not depend on n (constant approximation ratio)

  - "r(n)-approximation algorithm": an algorithm (usually polynomial time)
    that achieves a r(n) approximation ratio

  - (polynomial-time) approximation scheme: given an instance X and a fixed
    real number epsilon > 0, we can create a (1+epsilon)-approxmation algorithm
      - if we can do this in polynomial time in terms of n, input size,
	it is called a PTAS
      - eg. perhaps O(n^{2/epsilon}) running time

  - fully polynomial-time approximation scheme (FPTAS): best possible we can
    hope for without solving the optimization problem exactly,
    we can get within an arbitrary factor of the optimum, (1+epsilon)-approx.,
    in time polynomial in n and 1/epsilon
      - eg. perhaps O({1/epsilon}^4 n^2) running time

Vertex Cover:

  - Repeatedly pick an edge and put both endpoints in C, then remove all
    edges incident on the two endpoints, until no edge remains.
    |C| <= 2 * OPT (size of smallest cover)
    because all covers include at least one endpoint from every edge in C
    so in particular, OPT >= |C|/2.

  - Repeatedly pick a vertex of largest degree and put it in C, removing
    all edges incident on that vertex, until no edge remains.
    There are inputs where |C| > 2 * OPT, unlike previous algorithm.

Travelling Salesperson Problem (TSP):

  - Input: complete graph G=(V,E) with non-negative integer edge costs c(e)
  - Output: a tour of G with minimum cost

  - a tour is a Hamiltonian cycle, that is, a simple cycle that visits every
    vertex in G
  - cost of a tour is the sum of edge costs in the tour
  - TSP is NP-complete in general
  - TSP is also hard to approximate

  - "triangle inequality": one side of a triangle is no more than the sum of
    the other two sides
	c(u,w) <= c(u,v) + c(v,w)
      - common when considering Euclidean or metric spaces

  - TSP with triangle inequality: also NP-complete, but often interesting
      - can approximate the solution within factor of 2

TSP with triangle inequality:

  - example: vertices are locations on a 5x5 grid

      vertices: a at (1,4), b at (2,1), c at (0,1), d at (3,4),
      		e at (4,3), f at (3,2), g at (5,2), h at (2,0).
      edge costs: distance in Euclidean plane

  - lower bound: cost of a minimum spanning tree
      - an optimal tour minus one edge is a spanning tree
  - upper bound: 2*cost of a minimum spanning tree
      - we can traverse each edge of the MST twice to visit all the vertices
      - need to make this more precise to prove

  - example: an MST of example above is {(a,b), (b,c), (b,h), (a,d), (d,e),
  	(e,f), (e,g)}

  - algorithm:
      pick a root vertex r
      compute a MST T from the root r
      L <- order of vertices in preorder walk of T
      return Hamiltonian cycle H produced by visiting vertices in order L

  - running time: clearly polynomial time

  - Theorem: This is a 2-approximation algorithm for the TSP with triangle ineq.
    Proof: Let Opt be an optimal tour, let T be a MST.
      Deleting an edge from Opt yields a spanning tree, thus
        c(T) <= c(Opt).
      A full walk W of T lists vertices each time it is enountered in a
      preorder traversal of T.
        eg., abcbhbadefegeda
      The walk traverses every edge of T twice, so
        c(W) = 2*c(T).
      Combining with above inequality,
        c(W) <= 2*c(Opt).
      In general, W is not a tour, since vertices may appear multiple times.
      However, we can delete repeated vertices from W without increasing the
      cost (by triangle inequality).
      Repeat deleting repeated vertices until a valid tour H remains.
      Then c(H) <= c(W) <= 2*c(Opt), showing our approximation ratio.