Segment Trees

Segment Trees

Segment trees is a data structure that can efficiently answer dynamic range queries. In dynamic problems, we need to frequently update and query the data. We need here, little pre-processing techniques.

One such range query is the problem of finding the index of the minimum element in an array within the range [i..j].
This is commonly known as the Range Minimum Query(RMQ).
Given an array A of size n=7 below:
                                                  values: 18 17 13 19 15 11 20
                                                 Indices:  0   1   2   3  4    5   6

The RMQ(1,3) = 2, as index 2 contains the minimum element in the range 1 to three(among 17,13,19).

Further RMQ(3,4) = 4, RMQ(0,0) = 0, RMQ(0,1) =1 and RMQ(0,6) = 5.
We'll use this array to explain the concepts that follow:

There are several ways to implement the RMQ, including the trivial algorithm: iterate the array from index i to j and report the index with the minimum value, this will take O(n) time per query.
There is another method and that is to use the segment trees.

Segment trees are another way to arrange data in a binary tree. There are several ways to implement Segment Trees. We'll here use the 1-based (starting from one and not from zero) compact array to do so. We use vi(shortcut for vector<int>) st to represent the binary tree.

Index 1 (not zero) is the root of the tree and left and right children of node at index p are 2*p and (2*p)+1 respectively.

The value of st[p] will be the RMQ value of the segment associated with index p.
Note: storing [L, R] at index p in st means storing RMQ(L, R) at index p.

The root (which is at index 1) of the segment tree represents segment [0, n-1] ( that is RMQ(0, n-1) ).  For each segment [L, R] stored in index p (where L != R) the segment will be split into [L, (L+R)/2] and [1+(L+R)/2, R] in the left and right vertices.
Thus it is something like we fill index p with RMQ(L, R) which can be found by comparing RMQ(L, (L+R)/2] and [1+(L+R)/2, R] which are stored in the left and right indices of p.
That is, the left and right sub-segment will be stored in index 2*p and (2*p)+1 respectively. If for a p, L=R, in that case, it is just the lead node and thus RMQ(L, R)=RMQ(L, L) = L, and st[p]=RMQ(L, R).
Otherwise, (if L!=R) we will recursively build the segment tree, comparing the minimum value of the left and right sub-segments and updating the st[p] of the segment.

Suppose the queries are to be answered up to range n. Then in our tree (which is stored in our vi st, above), we need n leaf nodes, then n/2 nodes for layer one, so on and on till the root node.
The space complexity for n ranged queries is $O(n + n/2 + n/4 + .....+1) = O(2*n)$  and the process to build the tree above recursively runs in O(n) because each time we recurse we are working on two nodes.

As we use simple 1-based(starting from one) compact array indexing, we need the st to be at least of size $2*(2^(floor(log2(n))+1))$ Where the floor is the floor function or the greatest integer function.
In general, we simply use a loose upper bound of space complexity O(4*n).

Here is the segment tree for the array above:

                                                                           [0,6]=5
                                                                     /                   \
                                                                 [0,3]=2               [4,6]=5
                                                                 /        \                  /            \
                                                          [0,1]=1     [2,3]=2    [4,5]=5    [6,6]=6
                                                          /          \        /         \      /        \     
                                                         0            1     2          3   4         5      
Here is the array                            18           17    13       19  15       11       20

Verify the structure of the segment tree.

With the segment tree, answering RMQ is only a task of O(log n).

The answer for RMQ(i,i) is trivial and simply return i itself. However, for the general case          RMQ( i,  j), is further checks are needed. Let p1 = RMQ(i, (i+j)/2) and p2 = RMQ((i+j)/2+1,j). Then RMQ(i,j) is p1 if A[p1] <= A[p2] or p2 otherwise. This is implemented in rmq function in the code below.

In the above tree, let us try RMQ(1,3).
At root index [1,3] is in the range [0,3] so we go left. This node now has a range of [0,3] so we go to left and right both as both contain some elements of range [1,3]. The left contains [0,1] we split and go to only to the right node as only 1 is in [1,3]. Thus, we return 1 from [0,1].

The node with [2,3] range is completely inside [1,3] so we compare arr[2] and arr[3]. As arr[2] is smaller we return 2 from the node [2,3].
Thus we have 1 from [0,1] and [2] from [2,3], now we compare arr[i] for each of these i. This gives us our answer which is, 2.


Notice that this data structure allows us to avoid traversing the unnecessary parts of the tree.

In the worst case, we have to traverse two roots to leaf paths(example RMQ(0,5)) in this case we have to traverse 2*log(number of nodes in the tree) nodes i.e. $2*log(2n)$ which is nothing but $log(n).

Segment Trees are useful if the underlying array is frequently updated (that is, it is dynamic in nature).

If you change, for example, arr[5] in the above tree, you only need to update the root to leaf path which takes O(log n) time. You need to update [5,5], [4,5], [4,6],[0,6] and lo done.


Here is the code that implements the segment trees


#include <bits/stdc++.h>
using namespace std;
typedef vector<int> vi;
class SegmentTrees{
private:
vi st, A;
int n;
int left(int p){/////////returns the left child
return p<<1;
}
int right(int p){////////returns the right child
return p<<1 +1;
}
void build(int p, int L, int R){ /////////////This is our build function
/*
p is the node that we are going to fill
L is the start of the range
R is the end of the range.
*/
if(L==R){////////In the range for the node p is L,L in that case st[p] = L. That is the index for minimum element for leaf is L itself
st[p]=L;
}else{
build(left(p), L, (L+R)/2);///if not leaf first go and build left child of p and right child of p
build(right(p), (L+R)/2 +1, R);///
int p1 = st[left(p)], p2=st[right(p)];/////////////get the left and right childs
st[p] = (A[p1]<=A[p2])?p1:p2;//compare which of them is smaller.
}
}
int rmq(int p, int L, int R, int i, int j){
/*
Find the index of the minimum element in the range i, j;
*/
if(i>R||j<L){////////////the range of query is out of the range of the node
return -1; /////////////This is not the node we are looking for
}
if(L>=i&&R<=j)
return st[p];
int p1 = rmq(left(p), L, (L+R)/2,i,j);
int p2 = rmq(right(p), (L+R)/2 +1, R, i, j);
if(p1==-1)return p2;
if(p2==-1) return p1;
return (A[p1]<=A[p2])?p1:p2;
}
public://////////////The user interface
SegmentTrees(const vi &original){
A = original;
n = (int)A.size();
st.assign(4*n,0);//////////4n to be safe side
build(1,0,n-1);
}
int rmq(int i, int j){
return rmq(1,0,n-1, i, j);
}
};



You just need to create an instance(or object, whatever) in the main, and pass in the base array. Then use the RMQ function.




Extras:

Segment Trees can also be used to find the range sum, one can change the code above to return rsq(i,j) which returns $arr[i]+arr[......] + arr[j]$.
Try doing it on your own.

No comments:

Post a Comment

Installing albert on ubuntu 19.04

Installing Albert on Ubuntu 19.04... Albert is not still released for ubuntu 19.04. But still, you can install it using the following ...