Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Graph Databases (GDB) Adrian Silvescu Doina Caragea Anna Atramentov Problems And Motivations • The necessity to represent, store and manipulate complex data make RDBMS somewhat obsolete • [P1] Problem 1: Violations of the 1NF – – – • Multi-valued attributes Complex attributes Complex combination of the previous two [P2] Problem 2 : Accommodate Changes – – Appears when acquiring data from autonomous dynamic sources or Web (eg: genexp & restaurants ). RDBMS may require schema renormalization Problems and Motivations (contd) • [P3] Problem 3: Unified representation for: – – – – Data Knowledge (Schemas are a subset of this) Queries (More generally: Goals) [results+def] Models (Concepts are a particular example) • In order to facilitate the application of learning and reasoning methods on these structures The Big Picture Discovery & Modelli ng Information Exploitation USER Query = Goal Specification Information AVAILABLE INFORMA TION IN THE WORLD INFORMA TION INTEGRATION Information Integration ... KNOWLEDGE SOURCES ... LINK SOURCES ... DATA SOURCES Existing Approaches • RDBMS – may need schema renormalisation • Approaches that try to fix the above mentioned problems: – OO Databases [P1], [P2] - graphs [but procedural] – XML Databases [P1] (somewhat [P3]) – trees – OORDBMS [P1] – graphs with foreign keys • Others – Datalog – More Efficient Crippled Prolog – Network Models - graphs – Hierarchical Models – trees • Therefore the motivation for Graph Databases Outline • Graph Databases – – – – – – Examples DDL DML – Queries DML – UPDATES Informal Semantics DB => GDB • GDB vs. OO, XML, OR, … • Conclusions and Further Work Graph Databases • We propose a new kind of Database: Graph Databases (GDB) as a solution to Problems [P1],[P2] and [P3]. • In order to define the GDB we will specify: – The Data Definition Language (DDL) – The Query Language (more generally DML) – Informal Semantics of the above languages • We will also show how to convert existing DBs (RDBMS) into the GDB DDL to facilitate the transition to GDBs Goals and Design Choice • Goals – Declarativity – Change • Design Choice : Have unique instance identifiers vs. having foreign keys – Close in Spirit to OO – Will allow us to cope easier with Change – Declarativity is an issue in OO, but not for GDB as we will show Database Representation • Sailors(sid:integer, sname:char(10), rating: integer, age:real) • Boats(bid:integer, bname:char(10), color:char(10)) • Reserve(sid:integer, bid:integer, day:date) Sailors Reserves sid sname rating age 22 dustin 7 45.0 31 lubber 8 55.5 58 rusty 10 35.0 Boats sid bid day 22 101 10/10/96 58 103 bid bname color 101 Interlake red 102 Clipper green 103 Marine red 11/12/96 sid 22 name dustin rating 7 Graph Representation IOF Sailor s IO F TB ID1 age 45.0 L IOF 31 name lubbe r rating 8 IOF IOF Boat s Reserves IOF ID2 IOF age 55.5 ID8 IOF ID6 ID5 ID4 : sid : : ID3 ID7 : … Foreign Keys ID1 sid 22 name dustin rating 7 age 45.0 Sailor sid 22 day 10/10/96 ID4 bid 101 Boat bid ID6 bname color 101 Interlake red Data Representation in the GDB DDL Name1 Val 1 Name2 ID Val 2 …… Name N Val N • ID:(Name1=Val1,…,NameN=ValN) Examples: ID1:(sid=22, name=“Dustin”, rating=7, age=45.0) ID4:(sailor=ID1, day=“10/10/96”, boat=ID6) ID6:(bid=101, bname=“Interlake”, color=“red”) Defining New Concepts in GDB DDL– Grandson _ID1 GrSon _ID2 :- Person Person Person IOF IOF IOF _ID1 Son _ID1:(GrSon=_ID2) :_ID1:(IOF=“Person”,Son=_ID3), _ID3:(IOF=“Person”, Son=_ID2), _ID2:(IOF=“Person”). _ID3 Son GrSon _ID2 [DML-QL] Writing simple queries: •The names of all sailors who have reserved a red boat _X Sailors _X Boats Name IOF Name IOF _ID Boat _ID1 _ID :- Color Red _ID:(Name = _X) :_ID:(IOF = Sailors, Boat = _ID1, Name = _X), _ID1:(IOF = Boats, Color = Red). Informal Semantics • Three kinds of Definitions – Facts: • G1. [Extensional definition] – Definitions: • G1 :- G2. [Intensional definition] • G1 :- PROC f(x1,…,xn). [Procedural definition] • Queries = Graphs to be Matched = QG – The same as a definition: Query :- QG. Informal Semantics - Picture Query Query match Facts Extended Graph RDBMS => GDB • Sailors(sid:integer, sname:char(10), rating: integer, age:real) • Boats(bid:integer, bname:char(10), color:char(10)) • Reserve(sid:integer, bid:integer, day:date) Sailors Reserves sid sname rating age 22 dustin 7 45.0 31 lubber 8 55.5 58 rusty 10 35.0 Boats sid bid day 22 101 10/10/96 58 103 bid bname color 101 Interlake red 102 Clipper green 103 Marine red 11/12/96 DML - Updates • Inline Query – _X : [ _ID: (IOF = Sailor, sname = lubber, rating = _X)] • Updates: MODIFY (QryGraph, UpdList) – Add to all GrandSons the money of the Grandparent as a potential inheritance. – MODIFY ( _ID : (GrSon = _ID1), (=> NEWID:(IOF = POT_INHER, BENFICIARY = _ID1, AMOUNT = _AMNT: [ _ID:(Money = _AMNT )]) )) – . Change Book IOF Title Databases Author Ramakrishan _ID Author Gehrke Change Gene_ex IOF _ID value 0.7 IOF x / Exp IOF _ID1 _ID2 value value x1 x2 : IOF Aggregate Operation _IDn value xN High Order Queries Find all the fields from Tables that contain the name John. _ID:(Name=_X) :_ID1:(IOF=Tables), _ID2:(IOF=_ID1, _X=“John”). GDB vs. OO, XML, OR, CG, … • GDB are close in spirit to OO but not the same (GDB : no encapsulation + more IDs). • Close To Datalog but with IDs(links) vs foreign keys • The same for ORDBMs and somewhat XML • Close to Conceptual Graphs But CG do not have IDs • We can also use foreign keys: _ID:[ _ID(IOF = Sailors, sname = lubber)]. OO vs. GDB OO GDB ID1 Class: Person age: 42 name: john car Person IOF ID1 ID2 Class: Car color: red age 42 Car name John IOF car ID2 color red Conclusions and Further work • Advantages & Disadvantages of GDB • Conclusions – Proposed a new database model DBG that copes with problems existing in previous approaches – Showed how to link existing DB with GDB – Showed advantages of GDB over existing database approaches • Further Work – Use GDB for learning