===========================================================================
CSC 373H                Lecture Summary for Week  2             Winter 2006
===========================================================================

Activity Scheduling (cont'd).

Recall greedy algorithm for Activity Scheduling: sort by finish time.

  Alternate proof using idea of "promising" solution and an exchange
  argument -- exchange activities from an optimal solution with our
  partial solution to show our partial solution is still "promising"

 -  Let S_0, S_1, ..., S_n = partial solutions constructed by algo. at the
    end of each iteration.

 -  Prove by induction on i (# iterations) that S_i is "promising", i.e.,
    there is some optimal solution Opt_i that extends S_i using only
    activities from {A_(i+1),...,A_n} (S_i subset of Opt_i and Opt_i subset
    of S_i U {A_(i+1),...,A_n}).
    Note: Opt_i may not be unique (there may be more than one way to achieve
    optimal).

     .  Base case: S_0 = {} so any optimal solution Opt_0 extends S_0 using
        only activities from {A_1,...,A_n}.

     .  Ind. Hyp.: For some i >= 0, assume there is an optimal Opt_i that
        extends S_i using only activities from {A_(i+1),...,A)n}.

     .  Ind. Step: To prove: S_(i+1) is promising w.r.t. {A_(i+2),...,A_n}.
        From S_i to S_(i+1), algo. either rejects or includes A_(i+1).

        Case 1:  S_(i+1) = S_i
            This means A_(i+1) not compatible with S_i, so Opt_(i+1) = Opt_i
            extends S_(i+1) using only activities from {A_(i+2),...,A_n}.

        Case 2:  S_(i+1) = S_i U {A_(i+1)}
            Opt_i may or may not include A_(i+1) so consider both
            possibilities.

            Subcase a:  A_(i+1) in Opt_i
                Then Opt_(i+1) = Opt_i extends S_(i+1) using only activities
                from {A_(i+2),...,A_n}.

            Subcase b:  A_(i+1) not in Opt_i
                We need to argue this can only happen if Opt_i contains some
                activity that can be "exchanged" with A_(i+1) to create new
                optimal solution Opt_(i+1). 

                Then there must be A_j in Opt_i that overlaps A_(i+1)
                (otherwise, Opt_i U A_(i+1) would be better than optimal
                Opt_i).  Also, j > i+1 because A_(i+1) is compatible with
                S_i.  "Exchanging" these activities yields a new optimal
		solution that extends our partial schedule:
		Opt_(i+1) = Opt_i U {A_(i+1)} - {A_j} extends S_(i+1)
                using {A_(i+2),...,A_n} -- it contains same number of
                activities as Opt_i, and no overlap is introduced because
                f(i+1) <= f(j) (by sorting order).

        In all cases, there is optimal Opt_(i+1) that extends S_(i+1) using
        {A_(i+2),...,A_n}.

  - So each S_i is promising.  In particular, S_n is promising w.r.t. {},
    i.e., there is optimal Opt_n that "extends" S_n using activities from
    {}.  In other words, S_n must be optimal itself.


Minimum Spanning Tree.
    Input: Connected undirected graph G=(V,E) with positive cost c(e) > 0
        for each edge e in E.
    Output: A spanning tree T subset of E such that cost(T) (sum of the
        costs of edges in T) is minimal.

  - Terminology:
      . "Spanning tree": acyclic connected subset of edges.
      . "Acyclic": does not contain any cycle.
      . "Connected": contains a path between any two vertices.

 A. Brute force: consider each possible subset of edges.
    Runtime?  Exponential, even if we limit search to spanning trees of G.

 D. Boruvka's algorithm (1926):
    Idea: do steps like Prim's algorithm in parallel
        initially n trees (the individual vertices)
	repeat
	    for every tree T, select a minimum-cost edge incident to T
	    add all selected edges to the MST (causing trees to merge)
	until only one tree 
	return this tree T
    Runtime?  Analysis similar to merge sort. Each pass reduces number of
        trees by factor of two, so O(log n) passes. Each pass takes O(m)
	time, so total is O(m log n)
    Correctness?  To come...

 B. Kruskal's algorithm (1956):
        // let m = |E| (# edges) and n = |V| (# vertices)
        sort edges by cost, i.e., c(e_1) <= c(e_2) <= ... <= c(e_m)
        T := {} // partial spanning tree
        for each v in V: MakeSet(v) // initialize disjoint sets
        for i := 1 to m:
            let (u,v) := e_i
            if FindSet(u) != FindSet(v): // u,v not already connected
                T := T U {e_i}
                Union(u,v)
        return T
    Runtime?  Theta(m log m) for sorting; main loop involves sequence of m
        Union and FindSet operations on n elements which is Theta(m log n).
        Total is Theta(m log n) since log m is Theta(log n).
    Correctness?  To come...

 C. Prim's algorithm (Jarnik 1930, Prim 1957, Dijkstra 1959):
        Idea: start with some vertex s in V (pick arbitrarily) and at each
        step, add lowest-cost edge that connects a new vertex.
        Proof:  Might as well do at the same time as for Kruskal's

 E. Generalized MST algorithm:
	General greedy approach: build a spanning tree edge by edge,
	    including appropriate "small" edges and
	    excluding appropriate "large" edges.
        We can think of these algorithms as an edge-colouring process.
	    - initially, all edges of the graph are uncoloured
	    - one at a time colour edges either blue (accepted)
	        or red (rejected) to maintain a "colour invariant"

	Colour Invariant: there is a MST containing
	    all the blue edges and none of the red edges

	If we maintain this colour invariant and colour all the edges
	of the graph, the blue edges will form a MST!

        Terminology:
	  . "cut": a vertex partition (X, V-X)
	  . edge e "crosses" a cut if one end is in each side

	Rules for colouring edges:
	  . Blue Rule:
	        Select a cut that no blue edges cross.
	        Among the uncoloured edges crossing the cut,
		select one of minimum cost and colour it blue.
	  . Red Rule:
	        Select a simple cycle containing no red edges.
	  	Among the uncoloured edges in the cycle,
		select one of maximum cost and colour it red.
	Note the nondeterminism here: we can apply the rules at any time
	and in any order.

    Correctness?  What do we have to prove?
        Theorem: All the edges of a connected graph are coloured and
	the colour invariant is maintained in any application of a rule.

    To prove: The colour invariant is maintained.
        By induction on number of edges coloured.

	Initially, no edges are coloured, so any MST satisfies CI.

	Suppose CI true before blue rule is applied, colouring edge e blue.
	    Let T be a MST that satisfies CI before e is coloured.
	    If e in T, T still satisfies CI, done.
	    If e not in T, consider the cut (X, V-X) used in the blue rule.
	        There is a path in T joining the ends of e, and at least
		one edge e' on this path crossses the cut.
		By CI, no edge of T is red, and with blue rule,
		e' is uncoloured and c(e') >= c(e).
		Thus T - {e'} + {e} is a MST and it satisfies CI after
		e is coloured.

	Now suppose CI true before red rule is applied, colouring edge e red.
	    Let T be a MST that satisfies CI before e coloured.
	    If e not in T, T still satisfies CI, done.
	    If e in T, deleting e from T divides T in 2 trees T_1 and T_2
	        partitioning G (thus (T_1,T_2) is a cut).
	        Consider the cycle including e used in the red rule.
		This cycle must have another edge e' crossing cut (T_1,T_2).
		Since e' not in T, by CI and red rule, e' is uncoloured
		and c(e') <= c(e).
		Thus T - {e} + {e'} is a MST and it satisfies CI after
		e is coloured.

    To prove: All edges in the graph are coloured?  Next time...