=========================================================================== CSC 373H Lecture Summary for Week 12 Winter 2006 =========================================================================== Linear programming: - Network flows: Given network N=(V,E) with capacities c(e) for e in E, construct linear program with variables f_e for each e in E: maximize: SUM_{e in E+(s)} f_e subject to: f_e <= c(e) for all e in E SUM_{e in E-(v)} f_e = SUM_{e in E+(v)} f_e for all v in V where E-(v) = in-edges of vertex v and E+(v) = out-edges of vertex v, for all v in V. This is a direct re-statement of network flow problem. - Vertex Cover: . Problem definition: Input: Undirected graph G=(V,E). Output: Subset of vertices C that "covers" every edge (i.e., each edge has at least one endpoint in C), with minimum size. . Representing as integer program: use variable x_i for each vertex v_i in V minimize: x_1 + x_2 + ... + x_n subject to: x_i + x_j >= 1 for all (v_i,v_j) in E x_i in {0,1} for all v_i in V Integer program is completely equivalent to original problem, including NP-completeness (so no polytime algorithm). . Linear relaxation: remove restriction of x_i to integer values minimize: x_1 + x_2 + ... + x_n subject to: x_i + x_j >= 1 for all (v_i,v_j) in E 0 <= x_i <= 1 for all v_i in V Solution can be found in polytime, but may include fractional values of x_i's. Example: G=(V,E) where V={1,2,3}, E={(1,2),(2,3),(1,3)} becomes minimize: x_1 + x_2 + x_3 subject to: x_1 + x_2 >= 1 x_2 + x_3 >= 1 x_1 + x_3 >= 1 0 <= x_1, x_2, x_3 <= 1 with solution x_1 = x_2 = x_3 = 1/2. . Rounding: Compute optimal solution to linear program: x'_1, x'_2, ..., x'_n. Create cover as follows: for each v_i in V, put v_i in C iff x'_i >= 1/2. C is a cover because constraint x_i + x_j >= 1 guarantees at least one of x'_i, x'_j >= 1/2 for each edge (v_i,v_j). . 2-approximation bound: Consider minimum vertex cover C*. For i = 1,...,n, let x*_i = 1 if v_i in C*; x*_i = 0 otherwise. x*_1, ..., x*_n is a solution to linear program that satisfies all constraints with 0-1 values so SUM x'_i <= SUM x*_i = |C*| (since x'_i is optimal solution to linear program with no restriction on values). For i = 1,...,n, let x~_i = 1 if x'_i >= 1/2; x~_i = 0 othewise. Then, for each i, x~_i <= 2 x'_i so |C| = SUM x~_i <= 2 SUM x'_i <= 2 |C*| (by equation above) Hence, |C| is no more than twice the size of a minimum vertex cover. - Weighted vertex cover: section 11.6 (simple generalization). ------------------------ Approximation algorithms ------------------------ Some problems are hard to solve. eg. NP-complete problems: no one knows how to solve in worst-case polynomial time. - Perhaps a "good" answer (near-optimal) answer found efficiently is good enough. - "approximation ratio": an algorithm has approximation ratio r(n) if for any input of size n, C_alg / C_opt <= r(n) where C_alg is the "cost" of the solution found by the algorithm and C_opt is the "cost" of the optimal solution. - in some cases, ratio does not depend on n (constant approximation ratio) - "r(n)-approximation algorithm": an algorithm (usually polynomial time) that achieves a r(n) approximation ratio - (polynomial-time) approximation scheme: given an instance X and a fixed real number epsilon > 0, we can create a (1+epsilon)-approxmation algorithm - if we can do this in polynomial time in terms of n, input size, it is called a PTAS - eg. perhaps O(n^{2/epsilon}) running time - fully polynomial-time approximation scheme (FPTAS): best possible we can hope for without solving the optimization problem exactly, we can get within an arbitrary factor of the optimum, (1+epsilon)-approx., in time polynomial in n and 1/epsilon - eg. perhaps O({1/epsilon}^4 n^2) running time Vertex Cover: - Repeatedly pick an edge and put both endpoints in C, then remove all edges incident on the two endpoints, until no edge remains. |C| <= 2 * OPT (size of smallest cover) because all covers include at least one endpoint from every edge in C so in particular, OPT >= |C|/2. - Repeatedly pick a vertex of largest degree and put it in C, removing all edges incident on that vertex, until no edge remains. There are inputs where |C| > 2 * OPT, unlike previous algorithm. Travelling Salesperson Problem (TSP): - Input: complete graph G=(V,E) with non-negative integer edge costs c(e) - Output: a tour of G with minimum cost - a tour is a Hamiltonian cycle, that is, a simple cycle that visits every vertex in G - cost of a tour is the sum of edge costs in the tour - TSP is NP-complete in general - TSP is also hard to approximate - "triangle inequality": one side of a triangle is no more than the sum of the other two sides c(u,w) <= c(u,v) + c(v,w) - common when considering Euclidean or metric spaces - TSP with triangle inequality: also NP-complete, but often interesting - can approximate the solution within factor of 2 TSP with triangle inequality: - example: vertices are locations on a 5x5 grid vertices: a at (1,4), b at (2,1), c at (0,1), d at (3,4), e at (4,3), f at (3,2), g at (5,2), h at (2,0). edge costs: distance in Euclidean plane - lower bound: cost of a minimum spanning tree - an optimal tour minus one edge is a spanning tree - upper bound: 2*cost of a minimum spanning tree - we can traverse each edge of the MST twice to visit all the vertices - need to make this more precise to prove - example: an MST of example above is {(a,b), (b,c), (b,h), (a,d), (d,e), (e,f), (e,g)} - algorithm: pick a root vertex r compute a MST T from the root r L <- order of vertices in preorder walk of T return Hamiltonian cycle H produced by visiting vertices in order L - running time: clearly polynomial time - Theorem: This is a 2-approximation algorithm for the TSP with triangle ineq. Proof: Let Opt be an optimal tour, let T be a MST. Deleting an edge from Opt yields a spanning tree, thus c(T) <= c(Opt). A full walk W of T lists vertices each time it is enountered in a preorder traversal of T. eg., abcbhbadefegeda The walk traverses every edge of T twice, so c(W) = 2*c(T). Combining with above inequality, c(W) <= 2*c(Opt). In general, W is not a tour, since vertices may appear multiple times. However, we can delete repeated vertices from W without increasing the cost (by triangle inequality). Repeat deleting repeated vertices until a valid tour H remains. Then c(H) <= c(W) <= 2*c(Opt), showing our approximation ratio.