* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Database Research
		                    
		                    
								Survey							
                            
		                
		                
                            
                            
								Document related concepts							
                        
                        
                    
						
						
							Transcript						
					
					Department of Computer Science, National Tsing Hua University Database Research: The Past, The Present, and The Future Yi-Shin Chen Department of Computer Science National Tsing Hua University yishin@cs.nthu.edu.tw http://www.cs.nthu.edu.tw/~yishin/ Outline  Motivation  The Past  Evolution of Data Management [Gray 1996]  The Lowell Database Research Self Assessment Report   Where did it come from? What does it say?  The Present  The Future Motivation  Database research is driven by new applications, technology trends, new synergies with related fields, and innovation within the field itself. New Stuff The Database Community Evolution of Data Management Cons: • The transaction errors cannot be detected on time • The business did not know the current state 1950: Univac had developed a magnetic tape 1951: Univac I delivered to the US Census Bureau Manual Record Managers 1900 Punched-Card Record Managers Con: • Navigational programming interfaces are too low-level • Need to use very primitive and procedural database operations Programmed Record Managers • Birth of high-level programming languages • Batch processing 1955 On-line Network Databases • Indexed sequential records • Data independence • Concurrent Access 1965 -1980 Evolution of Data Management (Contd.) E.F. Codd outlined the relational model • Give Database users high-level set-oriented data access operations Relational Databases && ClientServer Computing • Uniform representation • 1985: first standardized of SQL • Unexpected benefit • Client-Server •Because of SQL, ODBC • Parallel processing •Relational operators naturally support pipeline and partition parallelism • Graphical User Interface •Easy to render a relation • Oracle, Informix, Ingres 1970 1980 Multimedia Databases • Richer data types • OO databases • Unifying procedures and data • (Universal Server) • Projects that push the limits • NASA EOS/DIS projects 1995 2000 Research Self Assessment  A group of senior database researchers gathers every few years to access the state of database research and point out some potential research problems      Laguna Beach, Calif. in 1989 Palo Alto, Calif. in 1990 and 1995 Cambridge, Mass. in 1996 Asilomar, Calif. in 1998 Lowell, Mass. in 2003  The sixth ad-hoc meeting     Last for two days 25 senior database researchers Output: the Lowell database research self assessment report More information: http://research.microsoft.com/~gray/lowell/ Attendees  Serge Abiteboul, Martin Kersten, Rakesh Agrawal, Michael Pazzani, Phil Bernstein, Mike Lesk, Mike Carey, David Maier, Stefano Ceri, Jeff Naughton, Bruce Croft, Hans Schek, David DeWitt, Timos Sellis, Mike Franklin, Avi Silberschatz, Hector Garcia Molina, Rick Snodgrass, Dieter Gawlick, Mike Stonebraker, Jim Gray, Jeff Ullman, Laura Haas, Gerhard Weikum, Alon Halevy , Jennifer Widom, Joe Hellerstein, Stan Zadonik, Yannis Ioannidis Photos captured from http://www.research.microsoft.com/~gray/lowell/Photos.htm The Main Driving Forces  The focus of database research  Information storage, organization, management, and access  The main driving forces  Internet  Particularly by enabling “cross enterprise” applications  Require stronger facilities for security and information integration  Sciences  Generate large and complex data sets  Need support for information integration, managing the pipeline of data product produced by data analysis, storing and querying “ordered” data, and integrating with the world-wide data grid The Main Driving Forces (Contd.)  Traditional DBMS topics  Technology keeps changing the rules  reassessment  E.g.: The ratios of capacity/bandwidths change  reassess storage management and query-processing algorithms   E.g., data-mining technology  DB component, NLP querying Maturation of related technologies, for example:  Data mining technology  DB component  Information retrieval  integrate with DB search techniques  Reasoning with uncertainty  fuzzy data Next Generation Infrastructure  Discuss the various infrastructure components that require new solutions or are novel in some other way 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. Integration of Text, Data, Code and Streams Information Fusion Sensor Data and Sensor Networks Multimedia Queries Reasoning about Uncertain Data Personalization Data Mining Self Adaptation Privacy Trustworthy Systems New User Interfaces One-Hundred-Year Storage Query Optimization Integration of Text, Data, Code and Streams  Rethink basic DBMS architecture supporting:        Structured data Text Space and time image and multimedia data Procedural data Triggers Data streams and queues  traditional DBMS  information retrieval  spatial and temporal DB  image retrieval/multimedia DB  user-defined functions  make facilities scalable  Data stream management Integration of Text, Data, Code and Streams  Rethink basic DBMS architecture supporting:        Structured data Text Space and time image and multimedia data Procedural data Triggers Data streams and queues  traditional DBMS  information retrieval  spatial and temporal DB  image retrieval/multimedia DB  user-defined functions  make facilities scalable  Data stream management  Start with a clean sheet of paper  SQL, XML Schema, XQuery  Too complex  Venders will pursue the extend-XML/SQL strategies  Research community should explore a reconceptualization Information Fusion  The typical approach  Because of Internet   Millions of information sources Some data can only be accessed at query time  Perform information integration on-the-fly  Extracttransformload tool (ETL)  Work with the “Semantic Web” people  Other challenges  Data Warehouse Need semantic-heterogeneity solution   Security policy: Information in each database is not free Probabilistic world of evidence accumulation Web-scale Sensor Data and Sensor Networks  Characteristics    Draw more power when communicating than when computing Rapidly changing configurations Might not completely calibrated Multimedia Queries  Challenges  Create easy ways to:      Analyze Summarize Search View Require better facilities for managing multimedia information Reasoning about Uncertain Data  Traditional DBMS have no facilities for either approximate data or imprecise queries  (Almost) all data are uncertain or imprecise  DBMSs need built-in support for data imprecision   The “lineage” of the data must be tracked Query processing must move to a stochastic one  The query answers will get better  The system should characterize the accuracy offered Personalization  Query answers should depend on the user  Relevance feedback should also depend on the person and the context  A framework for including and exploiting appropriate metadata for personalization is needed  Need to verify the information systems is producing a “correct” answer Data Mining  Focus on efficient ways to discover models of existing data sets  Developed algorithms are: classification, clustering, association-rule discovery, summarization…etc.  Challenges:   Data-mining research to develop algorithms for seeking unexpected “ pearls of wisdom” Integrate data mining with querying, optimization, and other database facilities such as triggers Self Adaptation  Modern DBMSs are more complex   Must understand disk partitioning, parallel query execution, thread pools, and userdefined data types Shortage of competent database administrators  Goals   Perform tuning using a combination of a rule-based system, a database of knob settings, and configuration data No knobs: all tuning decision are made automatically  Need user behaviors and workloads  Recognize internal malfunctions, identify data corruption, detect application failures, and do something about them Privacy  Security systems  Revitalize data-oriented security research  Specify the purpose of the data request  Access decisions should be based on  Who is requesting the data  To what use it will be put Trustworthy Systems  Trustworthy systems      Safely store data Protect data from unauthorized disclosure Protect data from loss Make it always available to authorized users Ensure the correctness of query results and dataintensive computations  Digital rights management   Protect intellectual property rights Allow private conversation New User Interfaces  How best to render data visually?    During the 1980’s, we have QBE, VisiCalc Since then, nothing…. Need new better ideas in this area  Query languages   SQL and XQuery are not for end users Possible choices?  Keyword-based query  InformationRetrieval community  Browsing  increasingly popular  Ontology + speech on NL  semantic Web +NLP One-Hundred-Year Storage  Archived information is disappearing    Capture on a deteriorating medium Capture on a medium requiring obsolete devices Application can interpret the information no longer works  A DBMS system can     Content remains accessible in a useful form Automate the process of migrating content between formats Maintain he hardware and software that each document needs Manage the metadata long with the stored document Query Optimization  Optimization of information integrators    For semi-structured query languages, e.g., XQuery For stream processors For sensor network  Inter-Query optimization involving large numbers of queries Next Steps  A test bed from Information-integration research  Revisit the solved problems  Sea changes  Avoid drawing too narrow a box around what we do  Explore opportunities for combining database and related technologies Department of Computer Science, National Tsing Hua University Thank You. Any Question? Reference  Jim Gray. "Evolution of Data Management." Computer v29 n10 (October 1996):38-46.  http://www.research.microsoft.com/~gray/lowell/