===========================================================================
CSC 310H                     Lecture 5 Summary                    Fall 2007
===========================================================================

------
Topics
------

 -  Expected codeword length
 -  Connection to entropy
 -  Entropy
 -  Shannon-Fano codes
 -  reference: sections 4.1, 5.1, 5.3, 5.4.

------------------------
Expected codeword length
------------------------

Definition:

    L = L(c,X) = E[length(X)], where X is a single source symbol.

    L = Sum_{i=1} {p_i*l_i}, where p_i is the probability of a_i and
                             l_i is the length of c(a_i)

Idea:

    The best code has the minimum possible L.

Observation:

    If X is a string of length N, then

    E[length(X)] = E[length(X_1)] + .. + E[length(X_N)] = N*L

    by linearity of expectation.

Reference: section 5.1

---------------------
Connection to entropy
---------------------

Result:

    L = Sum_{i=1}^I {p_i*l_i} >= Sum_{i=1}^I {p_i*log(1/p_i)} = H

Idea:

 -  Let K = Sum_{i=1}^I {2^{-l_i}}. KM inequality says that K <= 1. So
    log(K) <= 0.

 -  Write L = Sum_{i=1}^I {p_i*l_i} >= Sum_{i=1}^I {p_i*(l_i+log(K)}

 -  Manipulate terms to obtain

    H-L <= Sum_{i=1}^I {p_i*log(1/(2^{-l_i}*K*p_i))}

 -  Use log(x)<=(x-1) for all x>0 to show RHS above is <=0.

Reference: section 5.3, but the book uses "Gibbs inequality". The proof
           sketch above does not.

-------
Entropy
-------

The quantity H = Sum_{i=1}^I {p_i*log(1/p_i)} is called the "entropy" of
the random experiment that has p_i as the probability of outcome a_i.

Define h(a_i) = log(1/p_i). We call this quantity the "information content"
of outcome X=a_i. Then, the entropy can be seen as the expected value of
the information content of various outcomes.

References:

 -  Section 2.4 contains the definition of entropy.

 -  Section 4.1 contains a discussion why the entropy is a sensible measure
    of the information content of a random variable.

IMPORTANT:

 -  Entropy is a very important concept in this course. It is worth
    spending some time to become familiar with it. Section 4.1 is a bit
    long, but it is quite informative.

------------------
Shannon-Fano codes
------------------

Ideas:

 -  A good code has L as small as possible.

 -  Ideally, choose lengths l_i = log(1/p_i), so that L=H.

 -  Unfortunately, these are not necessarily integers.

 -  Instead, choose l_i = ceiling(log(1/p_i)).

 -  Check these lengths satisfy the KM inequality.

 -  Check: H <= L < H + 1

Reference: read section 5.4.

Note: the book doesn't call them Shannon-Fano codes, but we do.