=========================================================================== CSC 363H Lecture Summary for Week 10 Summer 2007 =========================================================================== ------------------------- Hardness and Completeness ------------------------- Suppose we have two classes of languages, C and D, and we know that C is a subset of D. Furthermore, suppose that we do not know whether or not C = D. Idea: we want to identify the "hardest" problems in class D. We say that a language L is D-hard if it has the follwing property: IF L is in C THEN C = D (i.e. D subset of C). We say that a language L is D-complete if L is D-hard, and L is in D. Note: it is not clear that any language should have this property. However, if we manage to show that a language L is in fact D-complete, we obtain the following: L is in C IFF C = D Thus, in a sense, L is the "hardest" problem in D. This has both theoretical and practical implications. theoretical: In order to show that C =/= D, i.e. that C is a proper subset of D, it is enough to concentrate the efforts in showing that L does not belong to C. practical: If very few people believe that C = D, showing that L is D-complete can be taken as very strong evidence that L is probably not in C. ------------------------- Decidable vs Recognizable ------------------------- Consider C = decidable languages, D = recognizable languages. Define L to be Recognizable-hard if for all recognizable L', L' <=m L. This definition agrees with the abstract definition given earlier because, if L is decidable and L' <=m L, then L' is decidable. Since this holds for every recognizable L', we obtain: If L is decidable, then Recognizable = Decidable. Theorem: A_TM is Recognizable-complete. Proof: easy exercise Thus, according to the property we talked about earlier, we get A_TM is decidable IFF decidable = recognizable What might seem strange at first sight is that in the computability part of the course we managed to show that both sides of this IFF statement are actually false. Things will not be as nice in the complexity part. ------- P vs NP ------- Now consider C = P and D = NP. Define L to be NP-hard if for all L' in NP, L' <=p L. Again, one can check that with this definition we get: If L is NP-hard and L is in P, then P = NP. Define L to be NP-complete if L in NP, and L is NP-hard. Theorem: If A is NPc, then A in P iff P = NP. Proof: - If P = NP, then A NPc -> A in NP -> A in P. - If A in P, then A NPc -> for all B in NP, B <=p A -> B in P (because A in P), i.e., NP subset of P so P = NP. Corollary: If P != NP and A is NPc, then A not in P. So proving NP-completeness "as good as" proving not in P. --------------- NP-completeness --------------- - SAT = { F : F is a propositional formula that is satisfiable, i.e., there is some assignment of truth-values to the variables of F that makes F true } - Cook-Levin Theorem: SAT is NPc SAT in NP: Given , check that c encodes an assignment of truth values to the variables of F that makes F evaluate to TRUE. This can be done in polytime, and F is satisfiable iff there is some value of c that makes the verifier accept. SAT is NP-hard: (high-level idea only) For an arbitrary language A in NP, by definition, there is some NTM M_A that decides A in time <= C n^k for some constants C, k. Given an input x, we can construct a formula F_x that describes possible computation paths (of length at most C |x|^k) of M_A on x, such that F_x is satisfiable iff there is some computation path of M_A that accepts x. Intuitively, we are simulating the TM model of computation using propositional formulas, which are similar to digital circuits. Details are in the textbook and are needed to ensure that this can be done in polytime. In general, to prove A is NP-hard, it's sufficient to show B <=p A for some B known to be NP-hard: if B <=p A then for all L in NP, L <=p B (by definition of NP-hardness for B) so L <=p A (since <=p is transitive, something you can prove as an easy exercise). Template for proofs of NP-completeness: To show A is NPc, prove that A in NP: Describe a polytime verifier for A. "Given , check c has right format and properties..." Argue that verifier runs in polytime and that x in A iff verifier accepts for some c. A is NP-hard: Show B <=p A for some NP-hard B. "Given y, construct x as follows: ..." Argue that construction can be carried out in polytime and that y in B iff x in A (often by showing y in B -> x in A and x in A -> y in B). Examples: - 3SAT is NPc: 3SAT in NP because it's a special case of SAT. In proof of Cook-Levin, possible to construct formula F_x in CNF, so CNF-SAT is NP-complete; CNF-SAT <=p 3SAT is proven separately. Note: Careful with directions! Trivially, 3SAT <=p CNF-SAT (3SAT is special case of CNF-SAT). But we need other direction, transforming instances of general problem into instances of restricted problem. - VERTEX-COVER is NPc: (see textbook) VERTEX-COVER (VC for short) = { : G is a graph that contains a vertex cover of size k, i.e., a set C of k vertices such that each edge of G has at least one endpoint in C } . VC in NP: certificate = vertex cover of size k. . VC is NP-hard: 3SAT <=p VC. Given F = (a1 \/ b1 \/ c1) /\ ... /\ (ar \/ br \/ cr), where ai,bi,ci in {x1,~x1,x2,~x2,...,xs,~xs}, construct G=(V,E) and k such that F satisfiable iff G contains vertex cover of size k, as follows: k = s + 2r V = { a1,b1,c1, ..., ar,br,cr, x1,~x1, ..., xs,~xs } E = { (xi,~xi) : 1 <= i <= s } U { (ai,bi),(bi,ci),(ci,ai) : 1 <= i <= r } U { (l,x) : l = ai or bi or ci, and x = xj or ~xj corresponding to l } For example, if F = (x1 \/ ~x2 \/ ~x4) /\ (x2 \/ ~x3 \/ x1) /\ (~x3 \/ x4 \/ ~x2), then a1=x1, b1=~x2, c1=~x4, a2=x2, b2=~x3, c2=x1, a3=~x3, b3=x4, c3=~x2 so k = 4 + 2*3 = 10 V = {a1,b1,c1, a2,b2,c2, a3,b3,c3, x1,~x1, x2,~x2, x3,~x3, x4,~x4} E = { (x1,~x1), (x2,~x2), (x3,~x3), (x4,~x4), (a1,b1), (b1,c1), (c1,a1), (a1,x1), (b1,~x2), (c1,~x4), (a2,b2), (b2,c2), (c2,a2), (a2,x2), (b2,~x3), (c2,x1), (a3,b3), (b3,c3), (c3,a3), (a3,~x3), (b3,x4), (c3,~x2) } Clearly, construction can be done in polytime (with one scan of F). Also, if F is satisfiable, then there is an assignment of truth values that make at least one literal in each clause true. Pick a cover C as follows: for each variable, C contains xi or ~xi, whichever is true under the truth assignment; for each clause, C contains every literal except one that's true (pick arbitrarily if more than one true literal). C contains exactly s+2r vertices and is a cover: all edges (xi,~xi) are covered; all edges in clause triangles are covered (because we picked two vertices from each triangle); all edges between "clauses" and "variables" are covered (two from inside triangle, one from true literal for that clause). Finally if G contains a cover C of size k=s+2r, C must contain at least one of xi or ~xi for each i (because of edges (xi,~xi)) and at least two of ai,bi,ci for each i (because of triangle), so only way for C to have size s+2r is to contain exactly one of xi or ~xi and exactly two of ai,bi,ci, for each i. Since C covers all edges with only two vertices per triangle, the third vertex in each triangle must have its "outside" edge covered because of xi or ~xi. If we set literals according to choices of xi or ~xi in C, this will make formula F true: at least one literal will be true in each clause (because at least one edge from "variables" to "clauses" is covered by the variable in C). - SUBSET-SUM is NPc: Already known in NP. For NP-hardness, show 3SAT <=p SS: Given formula F = (a1 \/ b1 \/ c1) /\ ... /\ (ar \/ br \/ cr) where ai,bi,ci in {x1,~x1,...,xs,~xs}, construct numbers as follows: . For j = 1,...,s, number xj = 1 followed by s-j 0s followed by r digits where k-th next digit equals 1 if xj appears in clause C_k, 0 otherwise; number ~xj = 1 followed by s-j 0s followed by r digits where k-th next digit equals 1 if ~xj appears in clause C_k, 0 otherwise. . For j = 1,...,r, number Cj = 1 followed by r-j 0s and number C'j = 2 followed by r-j 0s. . Target t = s 1s followed by r 4s. Clearly, this can be constructed in polytime. Example of reduction for F = (x1 \/ ~x2 \/ ~x4) /\ (x2 \/ ~x3 \/ x1) /\ (~x3 \/ x4 \/ ~x2): S = { x1 = 1000110, ~x1 = 1000000, x2 = 0100010, ~x2 = 0100101, x3 = 0010000, ~x3 = 0010011, x4 = 0001001, ~x4 = 0001100, C1 = 0000100, C'1 = 0000200, C2 = 0000010, C'2 = 0000020, C3 = 0000001, C'3 = 0000002 } t = 1111444 Note: slightly different from book to ensure S contains all distinct numbers (book's reduction constructs S with repeated numbers). If F is satisfiable, then there is a setting of variables such that each clause of F contains at least one true literal. Consider the subset S' = {numbers that correspond to true literals}. By construction, SUM_{x in S'} x = s 1s followed by r digits, each one of which is either 1, 2, or 3 (because each clause contains at least one true literal). This means it is possible to add suitable numbers from {C1,C'1,...,Cr,C'r} so that the last r digits of the sum are equal to 4, i.e., there is a subset S' such that SUM_{x in S'} x = t. If there is a subset S' of S such that SUM_{x in S'} x = t, then S' must contain exactly one of {xj,~xj} for j = 1,...,n, because that is the only way for the numbers in S' to add to the target (with a 1 in the first s digits). Then, F is satisfied by setting each variable according to the numbers in S': for each clause j, the corresponding digit in the target is equal to 4 but the numbers Cj and C'j together only add up to 3 in that digit; this means that the selection of numbers in S' must include some literal with a 1 in that digit, i.e., clause j contains at least one true literal.