=========================================================================== CSC 236 Lecture Summary for Week 5 Winter 2012 =========================================================================== MergeSort(A,b,e): if b == e: return m = (b + e) / 2 MergeSort(A,b,m) MergeSort(A,m+1,e) # merge sorted A[b..m] and A[m+1..e] back into A[b..e] for i = b,...,e: B[c] = A[c] c = b d = m+1 for i = b,...,e: if d > e or (c <= m and B[c] < B[d]): A[i] = B[c] c = c + 1 else: # d <= e and (c > m or B[c] >= B[d]) A[i] = B[d] d = d + 1 - Recall: for i = a,...,b: BODY equivalent to: i = a while i <= b: BODY i += 1 with runtime 1 + (b - a + 1) * B where B = runtime of BODY. - Hence, worst-case runtime T(n) satisfies: T(1) = 1 (from first line) T(n) = 1 (all constant time work outside recursive calls) + n (first loop to copy A[b..e] into B[b..e]) + n (second loop to merge) + T(ceil(n/2)) (first recursive call) + T(floor(n/2)) (second recursive call) ---------------------------------------------------------------------------- NOTE (added after lectures were first posted): Keep in mind that this expression is not actually exact. Depending which steps are counted and which ones are ignored, we could say that: T(n) = T(ceil(n/2)) + T(floor(n/2)) + 5n + 7 or: T(n) = T(ceil(n/2)) + T(floor(n/2)) + 3n + 1 or any other linear function. Really, all we can say with certainty is that: T(n) = T(ceil(n/2)) + T(floor(n/2)) + Theta(n) When doing repeated substitution (next), we want to simplify the exact recurrence. We do this by ignoring floors and ceilings, and in general, replacing Theta expressions by a simple function with the correct rate of growth. In this case, this means we would generall use: T(n) = 2 T(n/2) + n for repeated substitution. However, in order to illustrate some of the algebra and simplifications you're likely to encounter, I am going to use the following arbitrary choice instead: T(n) = 2 T(n/2) + 2n + 1 ---------------------------------------------------------------------------- - Repeated substitution: T(n) = 2 T(n/2) + 2 n + 1 = 2 (2 T(n/4) + 2(n/2) + 1) + 2 n + 1 = 4 T(n/4) + 2 * 2 n + 3 = 4 (2 T(n/8) + 2(n/4) + 1) + 2 * 2 n + 3 = 8 T(n/8) + 3 * 2 n + 7 after i steps = 2^i T(n/2^i) + 2 n i + (2^i - 1) base reached when n/2^i = 1, i.e., n = 2^i, i.e., i = lg n = 2^{lg n} T(n/n) + 2 n lg n + (2^{lg n) - 1) = T(1) n + 2 n lg n + n - 1 = 2 n lg n + 2 n - 1 - T(n) in Omega(n log n): Try T(n) >= 2 n lg n for all n >= 1. base: T(1) = 1 >= 0 = 2 (1) lg 1 I.H.: Let n > 1 and suppose T(j) >= 2 j lg j for 1 <= j < n. Step: Since n > 1, 1 <= floor(n/2) <= ceil(n/2) < n so T(n) = T(ceil(n/2)) + T(floor(n/2)) + 2 n + 1 >= 2 ceil(n/2) lg ceil(n/2) + 2 floor(n/2) lg floor(n/2) + 2 n + 1 >= 2 (ceil(n/2) + floor(n/2)) lg floor(n/2) + 2 n + 1 >= 2 n (lg floor(n/2) + lg 2) + 1 >= 2 n lg(2 floor(n/2)) + 1 >= 2 n lg(2 (n-1)/2) + 1 (since floor(n/2) >= (n-1) / 2) >= 2 n lg(n-1) + 1 Problem! We want >= 2 n lg n. - Trick: ceil(n/2) + 1 >= floor(n/2) + 1 >= (n-1)/2 + 1 = (n+1)/2, so change statement to T(n) >= 2 n lg(n+1). Inductive step then becomes: T(n) = T(ceil(n/2)) + T(floor(n/2)) + 2 n + 1 >= 2 ceil(n/2) lg(ceil(n/2)+1) + 2 floor(n/2) lg(floor(n/2)+1) + 2 n + 1 >= 2 (ceil(n/2) + floor(n/2)) lg((n+1)/2) + 2 n + 1 >= 2 n (lg((n+1)/2) + lg 2) + 1 >= 2 n lg(2 (n+1)/2) + 1 >= 2 n lg(n+1) + 1 >= 2 n lg(n+1) Exactly what we want! But new problem: base case doesn't work anymore T(1) = 1 NOT >= 2 = 2 (1) lg(1+1)... - Trick: inductive step has extra "+1" so try to change statement to T(n) >= 2 n lg(n+1) - 1. In inductive step, T(ceil(n/2)) contributes -1 and T(floor(n/2)) contributes -1: one of these cancels out +1 and other remains, as required. Base case? 2 (1) lg(1+1) - 1 = 2 - 1 = 1 <= 1 = T(1) - Complete proof: T(n) >= 2 n lg(n+1) - 1 for all n >= 1. Base: T(1) = 1 >= 1 = 2 (1) log(1+1) - 1. I.H.: Let n > 1 and suppose T(j) >= 2 j lg(j+1) - 1 for 1 <= j < n. Step: Since n > 1, 1 <= floor(n/2) <= ceil(n/2) < n, and ceil(n/2)+1 >= floor(n/2)+1 >= (n-1)/2+1 = (n+1)/2, so T(n) = T(ceil(n/2)) + T(floor(n/2)) + 2 n + 1 >= 2 ceil(n/2) lg(ceil(n/2)+1) - 1 + 2 floor(n/2) lg(floor(n/2)+1) - 1 + 2 n + 1 >= 2 (ceil(n/2) + floor(n/2)) lg((n+1)/2) + 2 n - 1 >= 2 n (lg(n+1) - lg 2 + 1) - 1 >= 2 n lg(n+1) - 1 Hence, T(n) in Omega(n log n). - Another possibility: instead of adding low-order term "-1", change constant and prove T(n) >= n lg(n+1) -- base case goes through and extra low-order terms don't matter. - T(n) in O(n log n): Similar process of trial and error results in statement T(n) <= 2 n lg(n-1) + 4 n - 1 for all n >= 2. base: T(2) = T(1) + T(1) + 2*2 + 1 = 7 <= 7 = 2*2*lg(2-1) + 4*2 - 1 T(3) = T(2) + T(1) + 2*3 + 1 = 15 <= 17 = 2*3*lg(3-2) + 4*3 - 1 (Why do we need n > 3 in I.H.? So that floor(n/2) >= 2 in the inductive step, to allow us to apply the I.H. Hence, n = 2 and n = 3 must be proved as base cases.) I.H.: Let n > 3 and suppose T(j) <= 2 j lg(j-1) + 4 n - 1 for 2 <= j < n. Step: Since n > 3, 2 <= floor(n/2) <= ceil(n/2) < n, and floor(n/2)-1 <= ceil(n/2)-1 <= (n+1)/2-1 = (n-1)/2, so T(n) = T(ceil(n/2)) + T(floor(n/2)) + 2 n + 1 <= 2 ceil(n/2) lg(ceil(n/2)-1) + 4 ceil(n/2) - 1 + 2 floor(n/2) lg(floor(n/2)-1) + 4 floor(n/2) - 1 + 2 n + 2 <= 2 (ceil(n/2) + floor(n/2)) lg((n-1)/2) + 2 n + 4 (ceil(n/2) + floor(n/2)) - 1 <= 2 n (lg(n-1) - lg 2 + 1) + 4 n - 1 <= 2 n lg(n-1) + 4 n - 1 Hence, T(n) in O(n log n). - Challenge: Prove that for all n >= 1, 2 n floor(lg n) + 2 n - 1 <= T(n) <= 2 n ceil(lg n) + 2 n - 1. Hint: In the inductive step, consider cases "n even" and "n odd", and use the fact that for all odd k >= 3, floor(lg k) = floor(lg(k-1)) and ceil(lg k) = ceil(lg(k+1)). General divide-and-conquer recurrences: - Many algorithms written using "divide-and-conquer" technique: split up problem, solve subproblems recursively, combine solutions. Worst-case running times of such algorithms satisfy recurrences of the form: { K if n <= B T(n) = { { a_1 T(ceil(n/b)) + a_2 T(floor(n/b)) + f(n) if n > B (for constants K > 0, a_1 >= 0, a_2 >= 0, b > 1, B > 0 from the algorithm -- note that a_1 + a_2 > 0 since a_1 is number of recursive calls on inputs of size ceil(n/b) and a_2 similarly for size floor(n/b)). - Master Theorem: If f(n) = Theta(n^d) for constant d >= 0, and a = a_1 + a_2, then the recurrence T(n) above has closed-form solution: { Theta(n^d) if a < b^d, T(n) = { Theta(n^d log n) if a = b^d, { Theta(n^{log_b a}) if a > b^d. Formal proof by induction (using exact recurrence for T(n)) requires proving statement for all powers of b, then extending to other values using fact that T(n) is non-decreasing (i.e., T(n) <= T(m) for all n <= m). See textbook for details. --------------------------------------------------------------------------- [ What follows was not covered in lecture and is included here for your reference. It is not a formal proof of the Master Theorem, but rather an informal derivation to show how to obtain the statement of the Theorem. ] Repeated substitution, where a = a_1 + a_2: T(n) = a T(n/b) + f(n) = a (a T(n/b^2) + f(n/b)) + f(n) = a^2 T(n/b^2) + a f(n/b) + f(n) = a^2 (a T(n/b^3) + f(n/b^2))+ a f(n/b) + f(n) = a^3 T(n/b^3) + a^2 f(n/b^2) + a f(n/b) + f(n) After k substitutions: T(n) = a^k T(n/b^k) + \sum_{i=0}^{k-1} a^i f(n/b^i) Base case reached roughly when k = log_b n: T(n) = a^{log_b n} K + \sum_{i=0}^{log_b n - 1} a^i f(n/b^i) = K n^{log_b a} + \sum_{i=0}^{log_b n - 1} a^i f(n/b^i) Cannot be simplified further without more information about f(n). If f(n) in Theta(n^d) (for constant d >= 0), then f(n) = c n^d (approximately) and T(n) = K n^{log_b a} + \sum_{i=0}^{log_b n - 1} a^i c (n/b^i)^d = K n^{log_b a} + c n^d \sum_{i=0}^{log_b n - 1} (a/b^d)^i Case 1: a = b^d, i.e., log_b a = d T(n) = K n^{log_b a} + c n^d \sum_{i=0}^{log_b n - 1} (a/b^d)^i = K n^d + c n^d log_b n = Theta(n^d log n) Case 2: a < b^d, i.e., log_b a < d \sum_{i=0}^{log_b n - 1} (a/b^d)^i 1 - (a/b^d)^{log_b n} 1 - a^{log_b n}/b^{d log_b n} = ----------------------- = ------------------------------- 1 - a/b^d 1 - a/b^d (n^d - n^{log_b a}) / n^d b^d n^d - n^{log_b a} = --------------------------- = --------- ------------------- (b^d - a) / b^d b^d - a n^d T(n) = K n^{log_b a} + c n^d \sum_{i=0}^{log_b n - 1} (a/b^d)^i b^d n^d - n^{log_b a} = K n^{log_b a} + c n^d --------- ------------------- b^d - a n^d = (K - c b^d / (b^d - a)) n^{log_b a} + c b^d / (b^d - a) n^d = Theta(n^d) Case 3: a > b^d, i.e., log_b a > d \sum_{i=0}^{log_b n - 1} (a/b^d)^i (a/b^d)^{log_b n} - 1 a^{log_b n}/b^{d log_b n} - 1 = ----------------------- = ------------------------------- a/b^d - 1 a/b^d - 1 (n^{log_b a} - n^d) / n^d b^d n^{log_b a} - n^d = --------------------------- = --------- ------------------- (a - b^d) / b^d a - b^d n^d T(n) = K n^{log_b a} + c n^d \sum_{i=0}^{log_b n - 1} (a/b^d)^i b^d n^{log_b a} - n^d = K n^{log_b a} + c n^d --------- ------------------- a - b^d n^d = (K + c b^d / (a - b^d)) n^{log_b a} - c b^d / (a - b^d) n^d = Theta(n^{log_b a})