Rishabh's Tech Blog: Union-Find Disjoint Sets

Union-Find Disjoint Sets

The Union-Find Disjoint Set (UFDS) is a DS to model a collection of disjoint sets. In mathematics, two sets are said to be disjoint sets if they have no element in common.

That is two disjoints sets are sets whose intersection is the empty set.

The UFDS gives the ability to efficiently determine which set an item belongs to (in O(1)), and also to unite two disjoint sets into one larger set.

Such DS can be used to solve the problem of finding connected components in an undirected graph. Initialize each vertex to a separate disjoint set, enumerate the graph edges and join every two vertices(that is disjoint sets) connected by an edge.

Now we can test if two vertices belong to the same component/set easily.

In this DS the main idea is in choosing a representative 'parent' item to represent a set.

We need to ensure that each set is represented by only one unique item, then determining if items belong to the same set becomes far simpler: The 'parent' item is used as an identifier for the set. To achieve the same, the UFDS uses a tree structure where the disjoint sets form a forest of trees(a group of trees).

Each tree corresponds to a disjoint sets form a forest of trees.

The root of the tree is determined to be the representative item for a set. Since a tree can only have one root, this representative item can be used as a unique identifier for the set.

To do this efficiently, we store the index of the parent item and the height of the tree of each set(vector<int> p and vector<int> rank). We will use vi as vector<int>.

p[i] stores the immediate parent of item i. If item i is the representative item of a certain disjoint set, then p[i] = i.

rank[i] yields the height of the tree rooted at item i.

Suppose we have 5 disjoint sets {0, 1, 2, 3, 4}. We initialize the DS such that each item is a disjoint set by itself with rank 0 and the parent of each item is initially set to itself.

To unite two disjoint sets we set representative item(root) of one disjoint set to be the new parent of the representative item of the other disjoint set ----unionSet(i,j).

This effectively merges two trees in UFDS representation and results in both i and j in having the same root.

Union by rank

We can use the information contained in vi rank to set the representative item of the set with higher rank to be the new parent of the set with lower rank, this minimizes the rank of the resulting tree. If both have the same rank, the choice is arbitrary.

This is called as a union by the rank way of arranging data.

Consider the following sets:

[0] [1] [2] [3] [4]

unoinSet(0,1) sets p[0] = 1 (parent of zero is one) rank(1) as 1 (rank of 1 is one).

[1]

[0] [2][3][4]

Now, let us define a recursive function findset(i) that returns findset(p[i]) if p[i]!=i and i otherwise.

What effectively it is doing is that given i, findset(i) traverses all the way to its parent.

When we call union(4,3), we have rank[findSet(4)]=rank[4]=0 which is smaller than rank[findset(3)]=rank(3) = 1, so we set p[4]=3.

With the heuristic, the path taken from any node to the representative item is effectively minimized.

The function isSameSet(i,j) checks if the two elements i and j belong to the same set. IT calls findSet(i) and findSet(j) and if they are equal that means they belong to same set else not.

Path compression

Whenever we find the representative site of a disjoint set by following the chain of parent links from a given item, we set the parents of all items traversed to point directly to the root, p[i] = root.

This changes the structure of the tree(for good) and yet preserve the actual constitution of the disjoint set.

Here is the code that does the same:

Code

#include <bits/stdc++.h>

#define vector<int> vi

class UnionFind{

private: vi p, rank;

public:

UnionFind(int N){//N is the number of elements

rank.assign(N, 0);//////create n nodes assign 0 to each

p.assign(N,0);

for(int i=0;i<N;i++)

p[i]=i;

}

int findSet(int i){

return (p[i]==i)?i: (p[i]=findSet(p[i])); //also doing the path compression simultaneously

}

void isSameSet(int i, int j){

return findSet(i)==findSet(j);

}

void unionSet(int i, int j){

if(!isSameSet(i,j)){

int x=findSet(i), y=findSet(j);

if(rank[x]>rank[y])

p[y]=x;

else{

p[x]=y;

if(rank[x]==rank[y])

rank[y]++;

}

};

Some additional and important function that this DS uses is int numDisjointSets() which results the number of disjoint sets and int sizeOfSet(int i) which returns the size of the set which contains the element i.

Try implementing the code on your own, if you can't here is my code:

	int numDisjointSets(){
	set<int> set;
	for(int i=0;i<p.size();i++){
	int a=p[i];
	set.insert(findSet(a));
	}
	return set.size();
	}

	int sizeOfSet(int i){
	int t = findSet(i);
	int count=0;
	for(int i=0;i<p.size();i++){
	int a = p[i];
	if(t==findSet(a)){
	count++;
	}
	}
	return count;
	}

Try picking up a pen and understand what the code does.

Rishabh's Tech Blog

Union-Find Disjoint Sets

Union-Find Disjoint Sets

Union by rank

Path compression

Code

No comments:

Post a Comment

Installing albert on ubuntu 19.04

Report Abuse