===========================================================================
CSC 236 Lecture Summary for Week 9 Winter 2012
===========================================================================
Now, a look at how to use loop invariants to *design* correct algorithms.
--------------------------
Binary search, iteratively
--------------------------
# Pre: A sorted and x comparable with A[0..n-1] (n = len(A))
# Post: 0 <= p <= n and A[0..p-1] < x <= A[p..n-1]
What we know at start:
A: |_____________________?_____________________|
0 n-1
What we want at end:
A: |_______ < x ___________|______ >= x _______|
0 p n-1
In between? Maintain range [b..e] to search. Since A[b..e]'s relationship
to x is unknown, we must have A[0..b-1] < x <= A[e+1..n].
This will be our loop invariant!
A: |___ < x ___|_______?_______|____ >= x _____|
0 b e n-1
Now, we can start writing loop.
Initialization? Make "in between" picture same as start:
b = 0
e = n - 1
Loop condition? Continue as long as range [b..e] not empty, i.e., b <= e.
When b > e, "in between" picture looks like end, which is what we want.
while b <= e:
This is binary search, so compare middle element with x.
Let's draw picture to get this right:
A: |_ < x _|__________|= x __|
0 b m e n-1
A: |_ < x _|___________|>=x|_________|_ >= x __|
0 b m e n-1
m = (b + e) / 2 # integer division
if A[m] < x: b = m + 1
else: e = m - 1
Does this work? Can prove for all b <= e, b <= m <= e. This guarantees
values of b, e at end of iteration satisfy b <= e+1. We add this to loop
invariant. By design, we know LI holds after each iteration -- because
algorithm was written to make sure it does.
Value of p at end? According to picture, p = b = e+1. All together:
# Pre: A sorted and x comparable with A[0..n-1] (n = len(A))
b = 0
e = n - 1
# LI: 0 <= b <= e+1 <= n and A[0..b-1] < x <= A[e+1..n]
while b <= e:
m = (b + e) / 2 # integer division
if A[m] < x: b = m + 1
else: e = m - 1
p = b
# Post: 0 <= p <= n and A[0..p-1] < x <= A[p..n-1]
Wait! What about termination? Hopefully, e and b get closer to each other
at each iteration. Actually, E = e-b+1 works:
- From LI, b <= e+1 guarantees E (- N.
- For any iteration, since b <= m <= e, e-(m+1)+1 < e-m+1 <= e-b+1 and
(m-1)-b+1 < m-b+1 <= e-b+1 so either way, E becomes strictly smaller.
----------------------
Formal Language Theory
----------------------
Basic definitions and conventions:
- "Alphabet": any finite, non-empty set of atomic symbols ("compound"
symbols like "ab" not allowed, to prevent ambiguity), e.g., {a,b,c},
{0,1}, {+}, ...
Convention: \Sigma used to denote alphabet.
- "String": any *finite* sequence of symbols; empty sequence is allowed
and denoted \epsilon or \Lambda (called "empty" or "null" string), e.g.,
a, ab, cccc are strings over {a,b,c}; \epsilon, ++++ are strings over
{+}, a+00 is _not_ a string over {a,b,c} but it _is_ over {0,1,+,a,b,c}.
Convention: neither \epsilon nor \Lambda allowed as symbol in alphabet
(to avoid confusion with empty string).
- "Language": any set of strings (empty or not, finite, or not), e.g.,
{bab, bbabb, bbbabbb, ...} is a language over {a,b,c}, {+,++} is a
language over {+}, {\epsilon} is a language over any alphabet, {} is a
language over any alphabet.
NOTE: {} is different from {\epsilon}: {} contains NO string,
{\epsilon} contains ONE string (the empty string \epsilon).
Motivation:
- Languages are a powerful abstraction: everything from logical formulas
to compilation of programs can be studied using languages. The study
of properties of languages is an important aspect of theoretical
computer science and some of its applications, particularly the
abstract problem of language "recognition":
Given language L and string s, does s belong to L?
This comes up in applications such as compiler design, where source
code must go through "lexical analysis" (to break up source code into
"tokens" that represent identifiers, function names, operators, etc.)
and "parsing" (to determine whether source code has correct syntax),
something you can study in CSC488.
It is also central to the study of computational complexity and
computability, which you can study in CSC463.
- We will look at two ways to express languages: descriptive ways
(regular expressions, context-free grammars), and procedural ways
(finite state automata, pushdown automata).
- But first, some more notation to work with strings and languages.
Operations on strings:
- "Length" of string s, denoted |s|, is number of symbols in s, e.g.,
|bba| = 3, |+| = 1, |\epsilon| = 0.
- Strings s, t are "equal" iff |s| = |t| and i-th symbol of s = i-th
symbol of t for 1 <= i <= |s| -- use s_i to denote i-th symbol of s.
- "Reversal" of string s (denoted s^R) is string obtained by reversing
symbols of s, e.g., 1011^R = 1101, +++^R = +++, \epsilon^R = \epsilon.
- "Concatenation" of strings s and t (denoted st or s.t) consists of
every symbol of s followed by every symbol of t, e.g., bba.bb = bbabb,
\epsilon.+++ = +++.
- Notation: For string s, natural number k, s^k denotes s concatenated
with itself k times, e.g., aba^2 = abaaba, ++^0 = \epsilon, \epsilon^3
= \epsilon.
- Notation: For alphabet \Sigma, \Sigma^n denotes set of all strings of
length n over \Sigma, and \Sigma* denotes set of all strings over
\Sigma. E.g., {a,b,c}^0 = {\epsilon},
{0,1}^3 = {000,001,010,011,100,101,110,111},
{+}* = {\epsilon,+,++,+++,++++,...} = { +^k : k (- N }.
Operations on languages: for all languages L, L' over alphabet \Sigma
(can be written symbolically: \-/ L, L' (_ \Sigma*),
- complement: ~L = \Sigma* - L
E.g., if L = { 0x : x (- {0,1}* } = {0,00,01,000,001,010,011,0000,...},
then ~L = {\epsilon} u { 1x : x (- {0,1}* }
- union (L u L'), intersection (L n L'), set difference (L - L'), e.g.,
{a,ab,abb} - {a,ba} = {ab,abb}
- more later...
---------------------
Finite state automata
---------------------
- Simple models of computing devices used to analyze strings. A FSA has
a fixed, finite set of "states", one of which is the "initial state"
and some of which are "accepting" (or "final") states, as well as
"transitions" from one state to another for each possible symbol of a
string. The FSA starts in its initial state and processes a string one
symbol at a time from left to right: for each symbol processed, the FSA
switches states based on the latest input symbol and its current state,
as specified by the transitions. Once the entire string has been
processed, the FSA will either be in an "accepting" state (in which
case the string is "accepted") or not (in which case the string is
"rejected").
- Example:
Simplified control mechanism for a vending machine that accepts only
nickels (5c), dimes (10c) and quarters (25c), where everything costs
exactly 30c and no change is ever given.
Alphabet \Sigma = {n,d,q} (for "nickel", "dime", "quarter"), set of
states = {'0','5','10','15','20','25','30+'} (for amount of money put
in so far; no need to keep track of excess since no change will be
provided), and transitions are defined by following table (state across
the top, input symbol down the side), with the initial state being "0"
and the only accepting state being "30+" -- representing amounts >= 30:
| 0 5 10 15 20 25 30+
---+-----------------------------
n | 5 10 15 20 25 30+ 30+
d | 10 15 20 25 30+ 30+ 30+
q | 25 30+ 30+ 30+ 30+ 30+ 30+
(How to read this table: current state specifies column, current input
symbol specifies row, entry at that row and column is next state; e.g.,
if current state is 15 and input symbol is d, next state is entry at
row "d" and column "15", which is 25.)
Computation of FSA on input such as "dndd" proceeds as follows: state 0
-> process 'd' -> state 10 -> process 'n' -> state 15 -> process 'd' ->
state 25 -> process 'd' -> state 30+. Since last state is accepting,
string "dndd" is accepted.