* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Adv Data Ch 4 - Computer Science
Survey
Document related concepts
Transcript
Western Michigan University
Department of Computer Science
CS 6310 - Advanced Data Structure
Mehdi Mohammadi
March 2015
1
Orthogonal Range Trees
Higher-Dimensional Segment Trees
Other Systems of Building Blocks
Range-Counting and the Semigroup Model
KD-Trees and Related Structures
2
Orthogonal Range-Searching problem
◦ Input: a (d-dimensional) box, a set of points
◦ Output: all the points in the set that lies in that box
Applications
◦ Geometric applications
◦ Database Queries
Select Emp from T where 50K<Salary<75K
AND age > 50 AND salesAmount > 500K
AND 2011<salesYear<2015
A 5-d orthogonal range query
◦ Preprocessing for queries
3
General situation
◦ Set of data points p1, …, pn
Pi = (pi1, …, pid)
◦ d-dimensional query interval [a1,b1[ ×…×[ad,bd[
◦ Return all points pi contained in that interval:
a1≤pi1<b1, …, ad ≤ pid<bd
O(fd(n) + k)
Structure
◦ Build a balanced search tree for the first coordinates of
data points
Each node has its Associated Interval: points whose first
coordinate falls into that interval
Build recursively a range search tree for the remaining d-1
coordinates on each node
4
Query:
◦ find O(log n) nodes correspond to [a1,b1[
◦ In each of those nodes perform d-1 dimensional
range search for [a2,b2[ × … × [ad, bd[
5
Example 2-d
◦ (0,1), (1,5), (2,8), (3,3), (5,0), (6,4), (7,6), (8,7), (9,9)
1-d range tree
5
{(0,1)}
1
{(1,5)} 5
{(0,1), (1,5), (2,8)}
8
{(1,5), (2,8)}
8
{(2,8)}
6
Theorem: Orthogonal search trees are static
structure supporting d-dimensional range
queries in a set of d-dimensional points
◦ Query time
Output sensitive time
O((log n)d + k) if output consists of k points
◦ Building tree time
O(n(log n)d )
◦ Space requirement
O(n(log n)d-1)
7
Fractional Cascading
◦ When we make a sequence of searches in different
but related sets, we can use the information of
search in previous set into the next set.
Algorithm
◦ For each node, sort the Associated Intervals by
second coordinate
◦ Link each point on this list to
The same point on the left or right lower neighbor
The point with the next smaller second coordinate if
the point is missing on that side
Or the first point on the list if there is no point with
smaller coordinate
8
Fractional Cascading
9
Fractional Cascading Search
◦ We have a search tree for the first coordinate
We have to select the corresponding nodes to the
canonical interval decomposition of the first interval
query
◦ Attached to each node is a structure for the search
in the second coordinate
These structure are linked together for fractional
cascading
◦ So that we need to search only in the set associated
with the first node
Then reuse that information in all later searches
10
Theorem: Orthogonal range trees with
fractional cascading are a static data
structure that support d-dimensional
orthogonal range queries in a set of ddimensional point (d>1);
◦ Query time
O((log n)d-1 + k) if output consists of k points
◦ Building tree time
O(n(log n)d-1 )
◦ Space requirement
O(n(log n)d-1)
11
The inverse problem of orthogonal range
searching problem
Input:
◦ A set of n ranges (d-dimensional intervals)
◦ A query point
Output:
◦ All ranges that contain that point
Solvable by generalization over segment tree
◦ It is defined recursively
12
Main structure:
◦ A balanced search tree whose keys are the first
coordinates of d-dimensional intervals
◦ Each node of that tree contains a d-1 dimensional
segment tree.
◦ In this d-1 dimensional segment tree associated
with node p, all intervals are stored for which p is
part of the canonical interval decomposition of the
first dimension.
13
Query
◦ Follow the search path of the first coordinate of the
query point
◦ In each node perform a (d-1) dimensional query with the
remaining coordinates associated with the node.
Theorem: d-dimensional segment tree is a static
data structure that lists all d-dimensional
intervals containing a given query key,
◦ Build time: O(n(log n)d )
◦ Space need: O(n(log n)d )
◦ Query time: O((log n)d + k) if there are k such intervals
14
Improvement: S-tree using fractional cascading
Algorithm
◦ Input: rectangles [ai,bi[ × [ci,di[ for i = 1,…, n
1. create balanced search tree T1 for
{a1,b1,a2,b2,…,an,bn}
2. attach an empty secondary balanced tree to
each node of the first tree
3. for i=1 to n
◦ 3.1 start from T1 root, put it on a stack.
◦ 3.2 Repeat As long as stack is not empty
Take the current node v from the stack
Insert {ci, di} as keys into the tree T2(v)
15
If intervalOf(v) is not in [ai,bi[ , check v’s left and right
subtrees.
If their intervals have some intersection with [ai,bi[ , then
put them on the stack.
4. for each i=1,…n
◦ 4.1. for all nodes v that belong to the canonical
interval decomposition of [ai,bi[ in T1
Insert rectangle [ai,bi[ × [ci,di[ into the segment tree
T2(v)
16
5. for each node v of T1
◦ Create pointers from each leaf of T2(v) to the
corresponding leaves of T2(v->left) and T2(v->right)
6. for each node v of T1
◦ For each node w of T2(v) create a pointer to the next
node above w in T2(v) that has some rectangle
associated with it.
Theorem: S-tree is a static data structure that
keeps track of a set of n rectangles, and for a
given point list all rectangles containing that
point
◦ Space: O(n(log n)2 )
◦ Query time: O(log n + k); if there are k output intervals
17
Canonical interval decomposition
◦ Decompose an interval in a union of a small number
of building blocks
To answer a query interval
◦ Decompose the query interval into a union of
building blocks
◦ Execute the query on those building blocks.
18
Building block query requires
◦ Decompose the queries
◦ Reconstruct the answer from the answer of building
blocks
◦ Also, some structure that answers the query for a fixed
block
◦ Represent each interval as a union of a small number of
blocks
Choice of building blocks tradeoff
◦ Reduce interval query to a small number of blocks needs
many building blocks
For each block we have to build a structure to answer
queries
19
Bentley and Maurer (1980) proposal
◦ Use an r-level structure for system of blocks
Interpreted as writing numbers to the base n(1/r).
Intervals of blocks
for top level
[an(1-1/r), bn(1-1/r)]
0≤a<b≤n1/r
O(n2/r)
blocks
O(n(j+1)/r)
blocks
20
Using r-level blocking we obtain a structure
to perform d-dimensional orthogonal range
searching
◦ Query time: O(rd log n + k)
◦ Preprocessing time: O(rd n1+(2d-2)/r log n)
◦ Query time is output sensitive for large r and n.
21
Range counting problem ask just for the number of points in
a range
◦ We do not need output sensitive time complexity
Use orthogonal range tree
◦ Instead of concatenating lists, just add up numbers
◦ Generalization by giving weight to points
In 1-dimensional version, just ask for the number of keys in
9
an interval
4
5
3
2
2
2
2
22
All operations in O(n(log n)d) for a set of n points
Difference with range searching
◦ Allow to make dynamic structure
Insertion, deletion and rebalance
Range searching has large associated trees for nodes
◦ lower bounds for operations are possible: O((log n)d)
In the semigroup version
◦ a commutative semigroup (S,+) is specified,
◦ each point is assigned a weight from S,
◦ Return semigroup sum of the weights of the keys in an
interval
Directly from canonical interval decomposition
23
Another structure to support orthogonal
range searching
◦ Easy to understand and implement
◦ Unsatisfactory performance
2-dimensional
KD-Tree:O(n1/2 + k)
Orthogonal range tree: O((log n)2 + k)
d-dimensional
KD-Tree: O(n1-1/d + k)
Orthogonal range tree: O((log n)d + k)
24
In each node make a comparison to enter the
left or right sub-trees
◦ In different levels compare against different
coordinates
In the root compare against x
In the second level compare against y, and so on.
25
Building KD-Tree
26
Building KD-Tree
27
KD-Tree range query
◦ Starting in the root, descend into each node whose
node interval has an intersection with the query
region
◦ Stop branches when an intersection is empty
Time complexity is as large as Ω(√n)
◦ Even in completely balanced tree with distinct keys
◦ This bound cannot be improved
28
Theorem: KD-Trees are a static data structure
that supports d-dimensional or orthogonal
range queries in a set of d-dimensional
points
◦ output sensitive time O(n1-1/d + k) if output consist
of k points
◦ Can be built in O(n (log n))
◦ Need space O(n)
29
Thank you
for your attention
30