=========================================================================== CSC 236 Lecture Summary for Week 9 Winter 2012 =========================================================================== Now, a look at how to use loop invariants to *design* correct algorithms. -------------------------- Binary search, iteratively -------------------------- # Pre: A sorted and x comparable with A[0..n-1] (n = len(A)) # Post: 0 <= p <= n and A[0..p-1] < x <= A[p..n-1] What we know at start: A: |_____________________?_____________________| 0 n-1 What we want at end: A: |_______ < x ___________|______ >= x _______| 0 p n-1 In between? Maintain range [b..e] to search. Since A[b..e]'s relationship to x is unknown, we must have A[0..b-1] < x <= A[e+1..n]. This will be our loop invariant! A: |___ < x ___|_______?_______|____ >= x _____| 0 b e n-1 Now, we can start writing loop. Initialization? Make "in between" picture same as start: b = 0 e = n - 1 Loop condition? Continue as long as range [b..e] not empty, i.e., b <= e. When b > e, "in between" picture looks like end, which is what we want. while b <= e: This is binary search, so compare middle element with x. Let's draw picture to get this right: A: |_ < x _|__________|= x __| 0 b m e n-1 A: |_ < x _|___________|>=x|_________|_ >= x __| 0 b m e n-1 m = (b + e) / 2 # integer division if A[m] < x: b = m + 1 else: e = m - 1 Does this work? Can prove for all b <= e, b <= m <= e. This guarantees values of b, e at end of iteration satisfy b <= e+1. We add this to loop invariant. By design, we know LI holds after each iteration -- because algorithm was written to make sure it does. Value of p at end? According to picture, p = b = e+1. All together: # Pre: A sorted and x comparable with A[0..n-1] (n = len(A)) b = 0 e = n - 1 # LI: 0 <= b <= e+1 <= n and A[0..b-1] < x <= A[e+1..n] while b <= e: m = (b + e) / 2 # integer division if A[m] < x: b = m + 1 else: e = m - 1 p = b # Post: 0 <= p <= n and A[0..p-1] < x <= A[p..n-1] Wait! What about termination? Hopefully, e and b get closer to each other at each iteration. Actually, E = e-b+1 works: - From LI, b <= e+1 guarantees E (- N. - For any iteration, since b <= m <= e, e-(m+1)+1 < e-m+1 <= e-b+1 and (m-1)-b+1 < m-b+1 <= e-b+1 so either way, E becomes strictly smaller. ---------------------- Formal Language Theory ---------------------- Basic definitions and conventions: - "Alphabet": any finite, non-empty set of atomic symbols ("compound" symbols like "ab" not allowed, to prevent ambiguity), e.g., {a,b,c}, {0,1}, {+}, ... Convention: \Sigma used to denote alphabet. - "String": any *finite* sequence of symbols; empty sequence is allowed and denoted \epsilon or \Lambda (called "empty" or "null" string), e.g., a, ab, cccc are strings over {a,b,c}; \epsilon, ++++ are strings over {+}, a+00 is _not_ a string over {a,b,c} but it _is_ over {0,1,+,a,b,c}. Convention: neither \epsilon nor \Lambda allowed as symbol in alphabet (to avoid confusion with empty string). - "Language": any set of strings (empty or not, finite, or not), e.g., {bab, bbabb, bbbabbb, ...} is a language over {a,b,c}, {+,++} is a language over {+}, {\epsilon} is a language over any alphabet, {} is a language over any alphabet. NOTE: {} is different from {\epsilon}: {} contains NO string, {\epsilon} contains ONE string (the empty string \epsilon). Motivation: - Languages are a powerful abstraction: everything from logical formulas to compilation of programs can be studied using languages. The study of properties of languages is an important aspect of theoretical computer science and some of its applications, particularly the abstract problem of language "recognition": Given language L and string s, does s belong to L? This comes up in applications such as compiler design, where source code must go through "lexical analysis" (to break up source code into "tokens" that represent identifiers, function names, operators, etc.) and "parsing" (to determine whether source code has correct syntax), something you can study in CSC488. It is also central to the study of computational complexity and computability, which you can study in CSC463. - We will look at two ways to express languages: descriptive ways (regular expressions, context-free grammars), and procedural ways (finite state automata, pushdown automata). - But first, some more notation to work with strings and languages. Operations on strings: - "Length" of string s, denoted |s|, is number of symbols in s, e.g., |bba| = 3, |+| = 1, |\epsilon| = 0. - Strings s, t are "equal" iff |s| = |t| and i-th symbol of s = i-th symbol of t for 1 <= i <= |s| -- use s_i to denote i-th symbol of s. - "Reversal" of string s (denoted s^R) is string obtained by reversing symbols of s, e.g., 1011^R = 1101, +++^R = +++, \epsilon^R = \epsilon. - "Concatenation" of strings s and t (denoted st or s.t) consists of every symbol of s followed by every symbol of t, e.g., bba.bb = bbabb, \epsilon.+++ = +++. - Notation: For string s, natural number k, s^k denotes s concatenated with itself k times, e.g., aba^2 = abaaba, ++^0 = \epsilon, \epsilon^3 = \epsilon. - Notation: For alphabet \Sigma, \Sigma^n denotes set of all strings of length n over \Sigma, and \Sigma* denotes set of all strings over \Sigma. E.g., {a,b,c}^0 = {\epsilon}, {0,1}^3 = {000,001,010,011,100,101,110,111}, {+}* = {\epsilon,+,++,+++,++++,...} = { +^k : k (- N }. Operations on languages: for all languages L, L' over alphabet \Sigma (can be written symbolically: \-/ L, L' (_ \Sigma*), - complement: ~L = \Sigma* - L E.g., if L = { 0x : x (- {0,1}* } = {0,00,01,000,001,010,011,0000,...}, then ~L = {\epsilon} u { 1x : x (- {0,1}* } - union (L u L'), intersection (L n L'), set difference (L - L'), e.g., {a,ab,abb} - {a,ba} = {ab,abb} - more later... --------------------- Finite state automata --------------------- - Simple models of computing devices used to analyze strings. A FSA has a fixed, finite set of "states", one of which is the "initial state" and some of which are "accepting" (or "final") states, as well as "transitions" from one state to another for each possible symbol of a string. The FSA starts in its initial state and processes a string one symbol at a time from left to right: for each symbol processed, the FSA switches states based on the latest input symbol and its current state, as specified by the transitions. Once the entire string has been processed, the FSA will either be in an "accepting" state (in which case the string is "accepted") or not (in which case the string is "rejected"). - Example: Simplified control mechanism for a vending machine that accepts only nickels (5c), dimes (10c) and quarters (25c), where everything costs exactly 30c and no change is ever given. Alphabet \Sigma = {n,d,q} (for "nickel", "dime", "quarter"), set of states = {'0','5','10','15','20','25','30+'} (for amount of money put in so far; no need to keep track of excess since no change will be provided), and transitions are defined by following table (state across the top, input symbol down the side), with the initial state being "0" and the only accepting state being "30+" -- representing amounts >= 30: | 0 5 10 15 20 25 30+ ---+----------------------------- n | 5 10 15 20 25 30+ 30+ d | 10 15 20 25 30+ 30+ 30+ q | 25 30+ 30+ 30+ 30+ 30+ 30+ (How to read this table: current state specifies column, current input symbol specifies row, entry at that row and column is next state; e.g., if current state is 15 and input symbol is d, next state is entry at row "d" and column "15", which is 25.) Computation of FSA on input such as "dndd" proceeds as follows: state 0 -> process 'd' -> state 10 -> process 'n' -> state 15 -> process 'd' -> state 25 -> process 'd' -> state 30+. Since last state is accepting, string "dndd" is accepted.