===========================================================================
CSC B63                 Lecture Summary for Week  9             Winter 2007
===========================================================================

[[Q:  denotes a question that you should think about and
      that will be answered during lecture.  ]]

--------------------
Breadth-First Search [ Section 22.2 ]
--------------------

 * Starting from a specified "source" vertex s in V, BFS visits every
   vertex v in G that can be reached from s, and in the process, constructs
   for each v a path from v to s with the smallest number of edges (a
   "BFS-tree" of the graph).  BFS works on directed or undirected graphs:
   we describe it for directed graphs.

 * To keep track of progress, each vertex is given a "colour", which is
   initially white.  The first time that a vertex is encountered, its
   colour is changed to gray, and once a vertex has been examined (we'll
   see what the difference is between "encountered" and "examined" in a
   second), its colour is changed to black.  At the same time, for each
   vertex v, we also keep track of the predecessor (the parent) of v in the
   BFS tree, p[v], and we keep track of the "distance" of v to s (the
   number of edges from s to v), d[v].

 * Intuitively, white vertices are "unknown" to BFS, black vertices are
   "known" and have been fully "explored" (i.e., BFS has encountered all
   their neighbours), and gray vertices are known but not fully explored:
   they represent the "frontier".  The distinction between black and gray
   vertices is important: it's how BFS keeps track of which vertices to
   explore next so that it really is working in a "breadth-first" manner.
   In order to manage the gray vertices, BFS stores them in a queue so that
   they are dealt with in a first-in, first-out manner.

         BFS(G=(V,E),s)
            for all vertices v in V
               colour[v] := white
               d[v] := infinity
               p[v] := NIL
            end for
            initialize an empty queue Q
            colour[s] := gray
            d[s] := 0
            p[s] := NIL
            ENQUEUE(Q,s)
            while Q is not empty do
               u := DEQUEUE(Q)
               for each edge (u,v) in E do
                  if colour[v] == white then
                     colour[v] := gray
                     d[v] := d[u] + 1
                     p[v] := u
                     ENQUEUE(Q,v)
                  end if
               end for
               colour[u] := black
            end while
         END BFS

 * Look at the example in the textbook.

 * Each node is enqueued at most once, since a node is enqueued only when
   it is white, and its colour is changed the first time it is enqueued.
   In particular, this means that the adjacency list of each node is
   examined at most once, so that the total running time of BFS is O(n+m),
   linear in the size of the adjacency list.

 * We can show that at the end of BFS, d[v] is equal to the number of edges
   on a shortest path from s to v (i.e., a path with the smallest number of
   edges).
     -> proof is outlined in "BFS computes shortest-path proof" handout

 * Applications:

    - Computing single-source shortest paths / distance in an unweighted
      graph (i.e., finding a shortest path through a maze).

    - Discovering connected components in a graph.

    - Identifying bipartite graphs, finding a 2-colouring of a graph.

    - Used for traversing decision trees in artificial intelligence.


------------------
Depth-First Search [ Section 22.3 ]
------------------

 * Just like for BFS, each vertex will be coloured white (when it hasn't
   been "discovered" yet), gray (when it's been encountered but its
   adjacency list hasn't been completely visited yet), or black (when its
   adjacency list has been completely visited).  The philosophy of DFS is
   "go as far as possible before backtracking", so we will also keep track
   of two "timestamps" for each vertex: d[v] will indicate the discovery
   time (when the vertex was first encountered) and f[v] will indicate the
   finish time (when it's been completely visited).

 * In order to implement the "depth-first" strategy, it is very natural to
   write DFS recursively.  (We could also use a stack instead of a queue to
   keep track of the vertices that remain to be examined, and write the
   algorithm iteratively.)

   Because DFS is commonly used to find information about the connected
   components of a graph, there is one more "twist" to the implementation:
   instead of being given a start vertex s as in BFS, the main DFS
   subroutine is called repeatedly on each unvisited vertex until all
   vertices have been visited.  (Note that the same trick could be used
   with BFS to entirely visit each connected component of a graph.)

 * Depth-First Search algorithm:

      DFS(G=(V,E))                      DFS-VISIT(G=(V,E),u)
         for each vertex v in V            colour[u] := gray
            colour[v] := white             time := time + 1
            d[v] := infinity               d[u] := time
            f[v] := infinity               for each edge (u,v) in E
            p[v] := NIL                       if colour[v] == white then
         end for                                 p[v] := u
         time := 0  (* global *)                 DFS-VISIT(G,v)
         for each vertex v in V               end if
            if colour[v] == white then     end for
               DFS-VISIT(G,v)              colour[u] := black
            end if                         time := time + 1
         end for                           f[u] := time
      END DFS                           END DFS-VISIT

 * Look at the example in the textbook.

 * As for BFS, since DFS-VISIT is only called on white vertices, and the
   vertices immediately become gray, DFS-VISIT is called at most once for
   each vertex.  Also, for each vertex, we visit its adjacency list at most
   once, so the total running time is just like for BSF, Theta(n+m) (linear
   in the size of the adjacency list).

 * Note that DFS constructs a "DFS-tree" for the graph (or a "DFS-forest"
   if it is called on each connected component), by keeping track of a
   predecessor p[v] for each node v.  For certain applications, we need to
   distinguish between different types of edges:

    - Tree Edges are the edges in the DFS tree.

    - Back Edges are edges from a vertex u to an ancestor of u in the DFS
      tree.

    - Forward Edges are edges from a vertex u to a descendent of u in the
      DFS tree.

    - Cross Edges are all the other edges that are not part of the DFS tree
      (from a vertex u to another vertex v that is neither an ancestor nor
      a descendent of u in the DFS tree).

    [[Q: All these types of edges can appear in a DFS-forest of a
      directed graph... but what about for an undirected graph?
      Can all these types of edges appear in a DFS-forest of an undirected
      graph?]]

 * It's possible to prove many interesting properties about the timestamps
   d[v] and f[v] maintained for each vertex, for example, they have
   "parenthesis structure", i.e., for all vertices u and v,

    either:
    - v is a descendant of u in the DFS-tree and [d[v],f[v]] is
      entirely contained within [d[u],f[u]], or
    - u is a descendant of v in the DFS-tree and [d[u],f[u]] is
      entirely contained within [d[v],f[v]], or
    - neither vertex is a descendant of the other and the intervals
      [d[u],f[u]] and [d[v],f[v]] are disjoint (i.e., they do not overlap).

 * White-path Theorem (Theorem 22.9):
   In a depth-first forest of a (directed or undirected) graph G = (V, E), 
   vertex v is a descendant of vertex u if and only if
   at the time d[u] that the search discovers u,
   vertex v can be reached from u along a path consisting
   entirely of white vertices.

 Note: You can use any of the theorems given in class for your assignments,
 on the exams, etc. 

 * Applications:

    - Discovering cycles in a graph.

    - Discovering connected components, and strongly connected components
      in a graph.

    - Topologically sorting a graph (i.e., ordering the vertices so that if
      there is a directed edge (u,v), then u <= v).