Survey							
                            
		                
		                * Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
DBxplorer: A System for Keyword-Based Search over Relational Databases Paper By: Sanjay Agrawal, Surajit Chaudhuri, Gautam Das Presented By Bhushan Chaudhari University of Texas at Arlington 1 Contents     Typical Key-word Search Overview of DBxplorer Processing Component for producing Symbol Table Storing Symbol table in databases   Search Component    Compaction Algorithms Finding Matches using graphs Generalized Matches Experiments and Statistics 2 Key-word Search  General Scenario     Finding specified keywords in same or different tables or completely different schemas Physical schema needs to be explored Existing indexes need to be leveraged by using proper data structures Data structure => Structure to be followed by symbol tables 3 Levels of Granularity   Column level granularity Row level granularity 2 4 2 TOYOTA COROLLA 2 TOYOTA CAMRY 2 HONDA ACCORD 2 HONDA CRV 4 Overview of DBxplorer     Supports Conjunctive Queries Implementation using MSSQL Server2000 and IIS Web Server ODBC Interface for database Connection Uses the functionality of relational engines very well 5 Overview of DBxplorer (Contd ..)  Publish  Determine the tables to be published    all_tab, all_tab_user Table relations in form of graphs Columns to be published  all_tab_columns, user_tab_columns “select table_name from user_tab_columns” “where column_name = ‘desired_name’  Building Symbol Tables 6 Overview of DBxplorer (Contd ..)  Search    Symbol table is looked to identify the tables, columns and rows containing keywords Join trees (set of tables which are related) Query is constructed for each join tree and tuples containing all keywords are found 7 Design Alternatives for Symbol Table  Location Granularity  Column level and cell level “Why there is no row level granularity? “ Hard to implement SQL queries work w.r.t. columns 8 Factors affected due to granularity  Space and time requirements     Searching time Time required to build Using in-built operators like ‘distinct’ accumulating all values inside column becomes easy Most of the typical database systems use Hash Value indexes which are good for equality searches 9 Factors affected due to granularity (Contd..)  Search Performance   Typically depends upon presence of indices Ease of maintenance     One time creation Insert Update Delete Pub-Col Y/N Y/N N Pub-Cell Y Y Y “ Which type of symbol table should we use?” 10 Storing Symbol Tables  Pub-Col Representation    Key-word -> Column Id Hash Value -> Column Id Types of Compression Algorithms Used  Foreign Key Compression (FK-Comp)  General Compression Technique (CP-Comp) 11 Building Compression table 12 Compression Algorithm 13 Storing Symbol Tables (Contd..)  Pub-Cell Representation   Hash Value -> Cell ID Hash Value -> Cell ID List “Retrieval of all locations for a key-word is achieved by looking up a single row from pub-cell symbol table”  No Compression  Pre-computation is complex  Inverted lists can be implemented using this table 14 Finding Matches for Keyword Search • Each join tree is mapped to a SQL Query and selects those rows that contain all keywords. • Ranking is based upon no. of joins (Quite similar to ranking upon proximity of words in documents) 15 Search Algorithm 16 Supporting Generalized Matches  Where T.C like ‘%STRING%’     Traditional databases use B+ Trees indices Pub-Prefix Representation      Microsoft SQL Server (Most Recent version) Where CONTAINS(C,’String’); Enables token searches having form WHERE T.C LIKE ‘P%K%’ Symbol table entry (Hash(k), T.C, P) Efficiency depends on length of prefix Length of prefix also affects symbol table size and build time. Stemming 17 Experiments – Symbol Table Granularity  Symbol Table Size 18 Experiments – Symbol Table Granularity  Publishing Time 19 Experiments – Symbol Table Granularity  Search Performance 20 Experiments – Scalability of Pub-Col  Data Size and Distribution 21 Experiments – Scalability of Pub-Col  Number of Keywords in Search 22 Experiments – Scalability of Pub-Col  Effectiveness of Compression Techniques 23 Experiments – Scalability of Pub-Col  Effectiveness of Pub-Prefix Method 24 References   DBxplorer: A System for Keyword Based Search over Relational Databases ICDE 2002 By: Sanjay Agrawal, Surajit Chaudhuri, Gautam Das DBXplorer: Enabling Keyword Search over Relational Databases. (Demo), SIGMOD Conference 2002: 627 By: Sanjay Agrawal, Surajit Chaudhuri, Gautam Das 25