=========================================================================== CSC 373H Lecture Summary for Week 3 Winter 2006 =========================================================================== MSTs (cont'd). Recall red-blue rules algorithm for MST. To finish the proof, still need to show: - All edges in the graph are coloured Suppose this method "stops early" (i.e., there is an uncoloured edge e but no rule can be applied) By CI, blue edges forms a forest of blue trees (some trees might just be isolated vertices). If both ends of e are in the same blue tree, the red rule applies to the cycle that would be formed by adding e, contradiction. If the ends of e are in different blue trees, say T_1 and T_2, the blue rule applies to the cut (T_1, V-T_1), contradiction. Thus if any uncoloured edge remains, some rule must be applicable. Remarks about greedy algorithms. - General form of problem: . Input: set of "candidates" C_1, C_2, ..., C_n, each one with weight (or cost) w(C_i). . Output: subset of candidates S subset of {C_1, C_2, ..., C_n} that satisfies certain constraints and such that total weight of S is maximal (or minimal). - General form of algorithm: sort candidates by weight S := {} for i := 1 to n: if S U {C_i} satisfies constraints: S := S U {C_i} return S - General form of correctness proof: . Partial solution S_i is "promising" if it can be extended to an optimal solution, i.e., if there exists S^opt_i such that S_i subset of S^opt_i and S^opt_i subset of S_i U {C_{i+1}, ..., C_n}. . Prove by induction that S_i is promising for 0 <= i <= n. . In proof, use "exchange lemma": if S_i is promising (with optimal solution S^opt_i) and S_{i+1} = S_i U {C_{i+1}}, then there exists S^opt_{i+1} that extends S_{i+1}. - Not all greedy algorithms fit this pattern exactly, but most are close. Knapsack problems. - General problem: given a set of items, each with a weight w_i and value v_i, together with a fixed maximum capacity C (all numbers are positive integers), find a subset of items of maximal value whose total weight does not exceed the capacity. - Fractional knapsack problem: . Output: Fractions a_1, a_2, ..., a_n in [0,1] (amount of each item to take) such that SUM a_i w_i <= C and SUM a_i v_i is maximal. Greedy algorithms: . Largest weight first doesn't work. Counter-example: C = 100, items = (100,100), (50,10), (50,10), ..., (50,10). . Largest value/weight first works: pick as much of each item as possible to fill up full capacity. Proof: exercise (think of algorithm as choosing fraction of each item in turn, in order of value/weight). - 0-1 knapsack problem: . Cannot break up items, so output becomes subset S of {1,2,...,n} s.t. SUM_{i in S} w_i <= C and SUM_{i in S} v_i is maximal. . If value/weight ratio is constant, then greedy by value not guaranteed to produce optimal answer but gives approx. ratio 2 (knapsack guaranteed to be at least 1/2 full). . If value/weight ratio is not constant, no greedy strategy works. ------------------- Dynamic Programming [Chapter 6] ------------------- Matrix Chain Multiplication. - Reminder: matrix multiplication, associativity, complexity. - Given matrix chain product: A_0 A_1 ... A_{n-1}, many ways to parenthesize (e.g., A(BC) or (AB)C). All will yield same answer but not same running time. Example: A 1x10 B 10x10 C 10x100 (AB)C = 1*10*10 + 1*10*100 = 100 + 1000 = 1100 ops A(BC) = 10*10*100 + 1*10*100 = 10000 + 1000 = 11000 ops - Matrix Chain Multiplication problem: Input: A_0, A_1, ..., A_{n-1} with dimensions [d_0 x d_1], [d_1 x d_2], ..., [d_{n-1} x d_n] Output: Fully parenthesized product with smallest total cost. - Brute force algorithm: How many possible ways to put in parentheses? Answer is called "Catalan number" and is Omega(4^n). - Greedy algorithm: . Product with smallest cost first. Counter-example: 10 1 10 100 greedy: 10 1 10 + 10 10 100 = 10100 other: 1 10 100 + 10 1 100 = 2000 . Product with smallest dimension last, or with largest dimension eliminated first. Counter-example: 1 10 100 1000 greedy: 10 100 1000 + 1 10 1000 = 1,010,000 other: 1 10 100 + 1 100 1000 = 101,000 . Nothing works! - Structure of optimal subproblems: . Idea: instead of trying to find where to put first product, try to find where to put last product. A_0 (A_1 ... A_{n-1}) -- last product costs d_0 d_1 d_n (A_1 A_2) (A_2 ... A_{n-1}) -- last product costs d_0 d_2 d_n ... ... (A_0 ... A_{n-2}) A_{n-1} -- last product costs d_0 d_{n-1} d_n . Greedy: take smallest last product? Counter-example: 1 10 100 1000 . Only n-1 possibilities. What information would help us find best answer? Knowing best cost of doing each subproduct. . Note that best overall product must include optimal subproducts. - Definition of array of subproblem values: . N[i,j] = smallest cost of multiplying A_i ... A_j . From structure of optimal solution, best way of doing A_i ... A_j (including all parentheses) must have the form (A_i ... A_{k-1}) (A_k ... A_j) for some i < k <= j, where each subproduct A_i ... A_{k-1} and A_k ... A_j is done in the best way possible (otherwise wouldn't be best overall). - Array recurrence: From reasoning above, N[i,i] = 0 and for i < j, N[i,j] = min{ d_i d_k d_{j+1} + N[i,k-1] + N[k,j] : i < k <= j } - Expressing this as an algorithm? Next time....