Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
AmbientDB Relational Query Processing in a P2P Network Peter Boncz and Caspar Treijtel LEE BYUNGIL PL Lab. Hongik University 2004.11.14 Outline 1. Introduction 1.1 Goal 1.2 Assumptions 1.3 Example: Collaborative Filtering in a P2P Database 1.4 Overview 2. AmbientDB Architecture 2.1 Data Model 2.2 Query Execution in AmbientDB 2.3 Dataflow Execution 2.4 Executing the Collaborative Filtering Query 3. DHTs in AmbientDB 3.1 Example: Approximated Collaborative Filtering 4. Conclusion 2 1. Introduction (1) AmbientDB A new peer-to-peer (P2P) DBMS prototype Developed at CWI (Centrum voor Wiskurde en Informatica) Distributed an ad-hoc P2P network Global query algebra Multi-wave stream processing plans Ambient Intelligence (AmI) Digital environments in which multimedia services are sensitive to people’s needs 3 Music Playlist Scenario amP2P player Log - mata information Homogeneous Content - AmbientDB instance, or external sources Heterogeneous AmbientDB Its collection Only Meta-information 4 1.1 Goal Full relational database functionality Cooperate in ad-hoc way with other AmbientDB devices Propose A general architecture for AmbientDB Complex query processing in ad-hoc P2P network 5 1.2 Assumptions (1) Upscaling (flexibility) Amount of cooperating devices to be potentially large Home environment and ad-hoc P2P network Downscaling Devices often have few resources (CPU, memory, network, battery) Schema integration All devices operate under a common global schema Data placement Data placement is determined by user Network failure Resilience of Chord While a query runs, the routing tree stays intact 6 Chord 7 1.2 Assumptions (2) Distributed database Priori Not in AmbientDB Federated database Statically Heterogeneous schema integration Mobile database Centralized database server and client (mobile node) P2P file sharing system Non-centralized and ad-hoc topologies Simple keyword text search 8 Example Music Schema The global schema “AMP2P” in AmbientDB distributed table On the global level The union of all horizontal fragments of these tables 9 1.3 Example : Collaborative Filtering in a P2P Database (1) amP2P player Access to a local content repository (digital music collection) AmbientDB instance Share all music content in the “home zone” Only share the meta-information in the huge P2P network 10 1.3 Example : Collaborative Filtering in a P2P Database (2) Memory-based implicit voting scheme Predicted vote for the active user for item j vi,j = the vote of user i on item j w(a,i) = weight function defined on the active user and user i vi = average vote for user i k = nomalizing factor weight(usera, useri) Times the example song has been fully played by user i Refined form Negative information – skipped 11 Collaborative Filtering Query in SQL 12 1.4 Overview General architecture Include Data model Query execution Three-level query execution process DHT (Distributed Hash Table) Global table indices Optimize the query Related work & future work Conclusion 13 AmbientDB Architecture 14 2. AmbientDB Architecture Distributed Query processor Execute query on all ad-hoc connected devices P2P protocol Chord scalable lookup and routing scheme P2P IP overlay networks made out of unreliable connections Query node = root A small number of connections per node Simultaneous bi-directional communication and query processing DHTs – global table indices Local DB component Local table Embedded database External data source – wrapper component (distributed database system) Schema integration engine Meta-data translation Using view-based schema mappings 15 AmbientDB Routing Tree Using IP Overlay 16 2.1 Data Model (1) Standard relational data model & algebra as query language Query are formulated against global tables Local node or limited set of node or all reachable nodes Converging answer Query locally Re-issue iteratively over more nodes 17 2.1 Data Model (2) Abstract Table LT (Local Table) Each node has private schema Global schema – global table T All participating nodes Ni carry a table instance Ti In query node Ti may be accessed as a LT DT (Distributed Table) Q : Set of node that participate in some global query The union of local table instances 18 2.1 Data Model (3) PT (Partitioned Table) Specialization of the DT All participating tuples in each Ti are disjunct between all nodes Advantage over DT Exact query answers can often be computed in an efficient distributed fashion By broadcasting a query and letting each node compute a local result without need for communication Attaching a bitmap index Ti.Q to each local table Ti “virtual” column #NODEID Be aware in which node are located Stored in a DT/PT Location-specific query restrictions 19 LT, DT and PT 20 2.2 Query Execution in AmbientDB (1) Three level translation Abstract level User query Selection, join, aggregation, sort Lists (List<Type>) List instances <a,b,c> Concrete level Table parameters, return value Partition, union Execution level Wave-plans 21 The Abstract Global Algebra 22 The Concrete Global Algebra 23 2.2 Query Execution in AmbientDB (2) Starting at the leaves Abstract query plan -> concrete Concrete operator have concrete result type Process continue to the root of the query graph Local result table, hence LT Local concrete variant of all abstract operators All tables -> LT Concrete union (T1)-> LT More efficient alternative query plans 24 2.2 Query Execution in AmbientDB (3) select, aggr, order support distributed execution(dist) Execute in all node on their local partition (LT) of a PT or a DT Produce again a distributed result (PT or DT) Broadcast the query through the routing tree The result is again dispersed over all node as a PT or DT Aggrmerge = aggrlocal(unionmerge(DT)):LT Reduce the fragments to be collected in the query node Save considerable bandwidth 25 2.2 Query Execution in AmbientDB (4) join variants Broadcast join (LT, T1)->T1 Foreign-key join (T1,DT)->T1 Referential integrity to minimize communication Split join (LT1,T1)->T1 Reduce bandwidth consumption O(T*N) -> O(T*log(N)) partition A special operator that performs double elimination Create a PT from a DT by creating a tuple participation bitmap at all nodes To be able to use the dist operators We should convert a DT to a PT 26 Mappings 27 2.3 Dataflow Execution (1) Query processing paradigm Routing tree using TCP connections is used to pass bidirectional tuple streams Multiple simultaneous such waves (upward and downward) Third translation phase Concrete query plan -> wave-plans Concrete operator One or more waves (Local dataflow aglebra operators) 28 2.3 Dataflow Execution (2) dist plans for select, aggr, order and foreign-key join buffer-to-buffer local operator in each node, without further communication broadcast join Propagates a tuple wave through the network split Split(<true,true>,<c1,c1>) Ordered -> effectively forming a DT/PT scan-select, quick-sort, merge-join, heap-based top-N, ordered aggregation All stream-based Require little memory 29 The Dataflow Algebra 30 2.4 Executing the Collaborative Filtering Query (1) 31 2.4 Executing the Collaborative Filtering Query (2) 32 2.4 Executing the Collaborative Filtering Query (3) Problems Query 1 Large list of all users that have ever listened to the example song Hog resources from all nodes in the network Query 2 Basically send all log record to the query node for aggregation More efficiently in an AmbientDB enriched with DHTs 33 3. DHTs in AmbientDB (1) Useful lookup structures for large-scale P2P applications Reduce the amount of nodes involved in answering a query Involving many nodes Decrease query performance Create an overload in the average query frequency Gnutella (not use DHT or global indices) Easy to locate popular music Difficult to locate less wel-known songs 34 3. DHTs in AmbientDB (2) To enable the query optimizer to automatically accelerate selection queries using such DHTs DHT indices can be exploited by a query optimizer to accelerate lookup queries Special form of a PT, as the partitions are disjunct selectchord(DHT):LT Dataflow level Route a message to the Chord finger on which the selection key-value hashes Retrieving all corresponding tuples as an LT via a direct TCP/IP transfer Non-complete index 35 DT and DHT in AmbientB 36 3.1 Example: Approximated Collaborative Filtering (1) HISTO Static histogram of fullylistened-to songs per user Reduce the histogram computation cost of query 37 Optimized collaborative filtering query in SQL 38 3.1 Example: Approximated Collaborative Filtering (2) 39 3.1 Example: Approximated Collaborative Filtering (3) 40 Network Bandwidth Compared 41 4. Conclusion Full query processing architecture Executing queries in a declarative, optimizable language, over an ad-hoc P2P network DHT Efficient global indices 42