TUTORIAL 7: Disjoint sets ========================= Recall the disjoint sets data structure --------------------------------------- Maintains a collection S = {S_1, S_2,...,S_k} of disjoint sets -> Each set is identified by a representative (some member of the set) Operations: MAKE-SET(x) UNION(x, y) FIND-SET(x) Note: No duplicate elements are allowed! Disjoint set application ------------------------ Example: Determine the connected components of an undirected graph. Graph G: a - b d f - g \ / | | / | c e h i Connected components: {a, b, c}, {d, e}, {f, g, h, i} (Make sure they understand what a graph and component is.) Goal: Given vertices v_1 and v_2 determine whether they belong to the same component Question: How do you use disjoint sets to solve this problem? (Let them guess the answer.) Connected_Components(G): for each vertex v \in V[G] do MAKE_SET(v) for each edge (u,v) \in E[G] do if FIND_SET(u) \neq FIND_SET(v) then UNION(u,v) Same_Component(u,v): if FIND_SET(u) = FIND_SET(v) then return true else return false Run this algorithm on the graph above. Edge processed Collection of disjoint sets {a}, {b}, {c}, {d},... (a,b) {a,b}, {c}, {d},.... (a,c) {a,b,c},{d},{e},.... (b,c) {a,b,c},{d},{e},.... (d,e) {a,b,c}, {d,e}, {f}, {g},.... etc. (Finish this example if they are interested.) Recall the linked list implementation -------------------------------------- -------------- | ----- | v / | | head -> c -> h -> t nil <- tail the representative of the set = the first element in the list other elements may appear in any order in the list the node contains pointers to: -the next element -the representative Weighted-union heuristic: append the smaller list onto the longer, and break ties arbitrarily. Running time? ------------- We will analyze the running times of disjoint-set data structures in terms of two parameters: n = the number of MAKE-SET operations, and m = the total number of MAKE-SET, UNION, and FIND-SET operations Theorem 21.1: Using the linked-list representation of a disjoint set and the weighted-union heuristic, a sequence of m MAKE_SET, UNION, and FIND_SET, n of which are MAKE_SET, takes O(m + nlog n) time. MAKE_SET operations: -each operation takes \Theta(1) each -so n MAKE_SET operations take time \Theta(n) -since n >= m, n MAKE_SET operations take time O(m) UNION operations: -using the weighted-union heuristic, we always append the smaller list onto the larger one What is the upper bound on the number of times that an element's representative must be updated? Consider a fixed element x: -the first time x's representative is updated: x must have started in the smaller set, so the new set has at least two members -the second time x's representative is updated: x must have started in the smaller set, so the new set has at least four members -the kth time x's representative is updated, where k \leq n: x must have started in the smaller set, so the new set has at least 2^k members (Aside: They could prove this using induction.) Therefore, after x's representative pointer has been updated \lceil log k \rceil times, the new set has at least k members. Since there were n MAKE_SET operations, the largest set can at most n members Therefore, x's representative is updated at most \lceil log n \rceil times So the total updated time for all n elements is O(n log n) [Notice that updating the head and tail pointers takes \Theta(1) per operation, so the total time to update the pointers over at most m UNION operations is \Theta(m). BUT we note that there can be at most n UNION operations, so the total time to update pointers is actually O(n). FIND_SET operations ------------------- -each FIND_SET operations takes O(1) time -so at most m FIND_SET operations takes O(m) time Therefore the total time of m MAKE_SET, UNION and FIND_SET operations is: O(m + n log n)