===========================================================================
CSC 363H                Lecture Summary for Week  4             Summer 2007
===========================================================================

---------------
Diagonalization -- section 4.2, pp.174-178
---------------

Which set is "larger": N or Z?  Easy: Z contains N as proper subset.  Which
set has larger "size"?  Not so easy: for finite sets, adding or removing
one element reduces size, but not so for infinite sets.  So what does
"infinite size" mean?  Are all infinite sets the same "size"?

Idea: compare size of sets without counting (can't count "to infinity"), by
pairing up elements from each set.  If both sets can be paired completely,
then sets have same size.  Otherwise, one set has "larger size".  This
concept is formalized through notion of "correspondence" (see textbook).
Note: this is equivalent to usual notion of "size" for finite sets.

Defn: A set is "countable" if it is finite or if there is a correspondence
between the natural numbers and the set, i.e., if there is a systematic way
to list every element in the set exactly once.

 -  {0,1}^* is countable: list strings in lexicographic order, i.e.,
    use correspondence e, 0, 1, 00, 01, 10, 11, 000, 001, ... (where "e"
    represents the empty string)

 -  Z is countable: use correspondence 0 -1 1 -2 2 -3 3 -4 4 ...

 -  Q is countable: "zig-zag" argument (see textbook for details -- covered
    in lecture)

 -  R is uncountable: proof by contradiction through diagonalization
    (see textbook for details -- covered in lecture)

What does this have to do with languages?

 -  Number of TMs countable because we can list all strings over
    appropriate alphabet and keep only the ones that describe valid TMs.

 -  Number of languages uncountable because same as number of infinite
    binary strings (same as number of real numbers).

 -  Hence, "only" countably many recognizable languages, i.e., _most_
    languages are unrecognizable!

But are there any natural unrecognizable languages?

---------------------
The "Halting" problem
---------------------

A_TM = { <M,w> | M is a TM that accepts input w } is recognizable.

 -  As described last week, there is a TM U (the "universal TM") that takes
    a reasonable encoding of <M> and its input w, and that carries out M's
    computation on w.  U accepts if M accepts w, U rejects if M rejects w
    (U loops if M loops on w).

Theorem:  A_TM is undecidable.

Proof:
 -  For contradiction, assume A_TM decidable, i.e., there is a TM H that
    accepts input <M,w> if M accepts w, and rejects if M does not accept w
    (either because M rejects or loops).
 -  Construct TM D that includes H as a "subroutine".

    D: on input <M>
       run H's instructions on <M,<M>>
       if H accepts, D rejects
       if H rejects, D accepts

    In other words, on input <M>, D rejects if M accepts input
    <M> and D accepts if M does not accept input <M>.
 -  What happens if we give <D> as input to D?  D should reject if D
    accepts input <D> and D should accept if D does not accept input <D>,
    i.e., D accepts input <D> iff D does not accept input <D>.
 -  Contradiction!  Hence, D cannot exist, which means H cannot exist,
    i.e., no TM can decide A_TM -- by the Church-Turing thesis, this means
    there is no general algorithm for solving this problem!

Diagonalization and the Halting problem:
 -  Claimed behaviour of H:
             <M1>   <M2>   <M3> ...
        M1  accept reject reject
        M2  reject reject accept
        M3  reject accept accept
        ...                     ...
 -  D simply diagonalizes over this table.