=========================================================================== CSC 363H Lecture Summary for Week 4 Summer 2007 =========================================================================== --------------- Diagonalization -- section 4.2, pp.174-178 --------------- Which set is "larger": N or Z? Easy: Z contains N as proper subset. Which set has larger "size"? Not so easy: for finite sets, adding or removing one element reduces size, but not so for infinite sets. So what does "infinite size" mean? Are all infinite sets the same "size"? Idea: compare size of sets without counting (can't count "to infinity"), by pairing up elements from each set. If both sets can be paired completely, then sets have same size. Otherwise, one set has "larger size". This concept is formalized through notion of "correspondence" (see textbook). Note: this is equivalent to usual notion of "size" for finite sets. Defn: A set is "countable" if it is finite or if there is a correspondence between the natural numbers and the set, i.e., if there is a systematic way to list every element in the set exactly once. - {0,1}^* is countable: list strings in lexicographic order, i.e., use correspondence e, 0, 1, 00, 01, 10, 11, 000, 001, ... (where "e" represents the empty string) - Z is countable: use correspondence 0 -1 1 -2 2 -3 3 -4 4 ... - Q is countable: "zig-zag" argument (see textbook for details -- covered in lecture) - R is uncountable: proof by contradiction through diagonalization (see textbook for details -- covered in lecture) What does this have to do with languages? - Number of TMs countable because we can list all strings over appropriate alphabet and keep only the ones that describe valid TMs. - Number of languages uncountable because same as number of infinite binary strings (same as number of real numbers). - Hence, "only" countably many recognizable languages, i.e., _most_ languages are unrecognizable! But are there any natural unrecognizable languages? --------------------- The "Halting" problem --------------------- A_TM = { | M is a TM that accepts input w } is recognizable. - As described last week, there is a TM U (the "universal TM") that takes a reasonable encoding of and its input w, and that carries out M's computation on w. U accepts if M accepts w, U rejects if M rejects w (U loops if M loops on w). Theorem: A_TM is undecidable. Proof: - For contradiction, assume A_TM decidable, i.e., there is a TM H that accepts input if M accepts w, and rejects if M does not accept w (either because M rejects or loops). - Construct TM D that includes H as a "subroutine". D: on input run H's instructions on > if H accepts, D rejects if H rejects, D accepts In other words, on input , D rejects if M accepts input and D accepts if M does not accept input . - What happens if we give as input to D? D should reject if D accepts input and D should accept if D does not accept input , i.e., D accepts input iff D does not accept input . - Contradiction! Hence, D cannot exist, which means H cannot exist, i.e., no TM can decide A_TM -- by the Church-Turing thesis, this means there is no general algorithm for solving this problem! Diagonalization and the Halting problem: - Claimed behaviour of H: ... M1 accept reject reject M2 reject reject accept M3 reject accept accept ... ... - D simply diagonalizes over this table.