* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Chapter 1: Introduction - CS-People by full name
		                    
		                    
								Survey							
                            
		                
		                
                            
                            
								Document related concepts							
                        
                        
                    
						
						
							Transcript						
					
					CAS CS 460 Introduction to Database Systems Thanks to Prof. George Kollios, Boston University and Prof. Mitch Cherniack Brandeis University for lecture materials About the course – Administrivia  Instructor:  Ravi Kothuri, rkothuri@cs.bu.edu Office, Hours: MCS 147, Mon/Wed 5-6PM and 7:30-8PM  Teaching Fellow:  Panagiotis Papapetrou, cs460tf@cs.bu.edu MCS 147, Tue/Thu 11 - 12:30 AM  Home Page:  http://www.cs.bu.edu/rkothuri Check frequently! Syllabus, schedule, assignments, announcements… 1.2 Grading  Homeworks: 20%  4-5 assignments  Midterm 20%  Final 30%  Projects 30%  5-6 parts 1.3 My Background  Oracle Corporation  PhD from University of California, Santa Barbara  Research:  Multi-dimensional indexing  Mobile Databases  Spatial, GIS systems and CAD/CAM databases  Google Maps type of technologies for Enterprise  Geometric algorithms for terrain management, city modeling,…  Data Mining (spatial, financial, …)  RFID technologies  Semantic –web (RDF) technologies  Book: “Pro Oracle Spatial”, Nov 2004  Teaching on invitation from Prof. George Kollios 1.4 Who uses Databases?  Universities (records for students, faculty, courses,…  Airlines (passengers, flights, luggage, …)  Banking (customers, loans, …)  Utilities (customers, usage history, bills); e.g. telecom, electric,..  Any Company: human resources  Employees, depts, facilities,… “Data is the primary and integral part of information industry. Proper management of the data using database technology is essential for any large-scale company, organization.’’ 1.5 What is a Database System?  Database: A very large collection of related data  Models a real world enterprise:  Entities (e.g., teams, games / students, courses)  Relationships (e.g., The Patriots are playing in the Superbowl)  Even active components (e.g. “business logic”)  DBMS: A software package/system that can be used to store, manage and retrieve data form databases  Database System: DBMS+data (+ applications) 1.6 Why Study Databases??  Shift from computation to information  Always true for corporate computing  More and more true in the scientific world  and of course, Web  DBMS encompasses much of CS in a practical discipline  OS, languages, theory, AI, logic 1.7 Managing data: A naïve approach  Why not store everything on flat files: use the file system of the OS, cheap/simple… Name, Course, Grade John Smith, CS112, B Mike Stonebraker, CS234, A Jim Gray, CS560, A John Smith, CS560, B+ …………………  Yes, but not scalable…  Filesize limitations, access/update performance is slow,.. 1.8 Problem 1  Data redundancy and inconsistency  Multiple file formats, duplication of information in different files (say, in different departments) John Smith, js@cs.bu.edu, CS112, B John Smith, Arts560, js@cs.bu.edu, B+ Smith J, js@cs.bu.edu, Math212, A Why is this a problem?  Wasted space  Potential inconsistencies (multiple formats, John Smith vs Smith J.) 1.9 Problem 2  Data retrieval:  Find the students who took CS560  Find the students with GPA > 3.5 For every query we need to write a program!  Need a Query/Retrieval engine that can support different ways to access data  Easy to write  Execute efficiently 1.10 Problem 3  Data Integrity  No support for sharing:  Prevent simultaneous modifications  No coping mechanisms for system crashes  No means of Preventing Data Entry Errors (checks must be hard-coded in the programs)  Security problems 1.11 Database Systems  Database systems offer solutions to all the mentioned problems  Database systems:  Support Modeling of the data  Provide Levels of Abstraction of the data  Provide programs to allow you to Retrieve/modify the data  SQL • For easy, standard specification of queries  Query Optimizer • To process your queries efficiently  Ensure Integrity Maintenance  Transaction Manager/Recovery Manager • to ensure atomicity/integrity in concurrent transactions • to ensure integrity after system crashes)  ……………. 1.12 Database Systems  Data Modeling  Levels of Abstraction  Data Retrieval  Data Modification/Integrity Maintenance 1.13 Data Model  A framework for describing  data  data relationships  data semantics  data constraints  Entity-Relationship model (Ch. 6)  A set of entities to model real-world objects  Relationships among entities  Relational model  Data as a set (or sets) of “records” or “tuples”  Each tuple in the set has the same set of attributes  Other models:  object-oriented model: inheritance, abstraction,…  semi-structured data models, XML: tuples in a set can have different attributes 1.14 Entity-Relationship Model Example of schema in the entity-relationship model 1.15 Entity Relationship Model (Cont.)  E-R model of real world  Entities (objects)  E.g. customers, accounts, bank branch  Relationships between entities  E.g. Account A-101 is held by customer Johnson  Relationship set depositor associates customers with accounts  Widely used for database design  Database design in E-R model usually converted to design in the relational model (coming up next) which is used for storage and processing 1.16 Relational Model Attributes  Example of tabular data in the relational model Customer-id customername 192-83-7465 Johnson 019-28-3746 Smith 192-83-7465 Johnson 321-12-3123 Jones 019-28-3746 Smith customerstreet customercity accountnumber Alma Palo Alto A-101 North Rye A-215 Alma Palo Alto A-201 Main Harrison A-217 North Rye A-201 1.17 Database Systems  Data Modeling  Levels of Abstraction  Data Retrieval  Data Modification/Integrity Maintenance 1.18 Levels of Abstraction  Data storage Involves Complex data structures  Hide complexity from users  Abstract views of the data (e.g., for storing a customer record)  Physical level: how a customer record is stored as bytes/words on disk • Mostly hidden from database users/programmers  Logical level: describes “types” inside the database type customer = record name : string; street : string; city : integer; ssn; integer; end;  View level: application programs hide details of data types. Views can also hide information (e.g., ssn) for security purposes. 1.19 View of Data A logical architecture for a database system 1.20 Physical Level: Data Organization  Data Storage (Ch 11) Where can data be stored?  Main memory  Secondary memory (hard disks)  Optical store  Tertiary store (tapes)  Move data? Determined by buffer manager  Mapping data to files? Determined by file manager 1.21 Database Architecture (physical level data organization) DBA DDL Commands DDL Interpreter File Manager Buffer Manager Storage Manager Data Secondary Storage Metadata Schema 1.22 Database Systems  Data Modeling  Levels of Abstraction  Data Retrieval  Data Modification/Integrity Maintenance 1.23 Data retrieval  Queries (Ch 3, 4) Query = Declarative data retrieval describes what data, not how to retrieve it Ex. Give me the students with GPA > 3.5 vs Scan the student file and retrieve the records with gpa>3.5  Why? 1. Easier to write 2. Efficient to execute 1.24 Data retrieval Query Query Processor Plan Query Optimizer Query Evaluator Data  Query Optimizer “compiler” for queries (aka “DML Compiler”) Plan ~ Assembly Language Program Optimizer Does Better With Declarative Queries: 1. Algorithmic Query (e.g., in C) 1 Plan to choose from 2. Declarative Query (e.g., in SQL) n Plans to choose from 1.25 Specifying the Query using SQL  SQL: widely used (declarative) non-procedural language  E.g. find the name of the customer with customer-id 192-83-7465 select customer.customer-name from customer where customer.customer-id = ‘192-83-7465’  E.g. find the balances of all accounts held by the customer with customer-id 192-83-7465 select account.balance from depositor, account where depositor.customer-id = ‘192-83-7465’ and depositor.account-number = account.account-number  Procedural languages: C++, Java, relational algebra 1.26 Data retrieval: Indexing (Ch 12)  How to answer fast the query: “Find the student with SID = 101”?  One approach is to scan the student table, check every student, retrurn the one with id=101… very slow for large databases  Any better idea? 1st keep student record over the SID. Do a binary search…. Updates… 2nd Use a dynamic search tree!! Allow insertions, deletions, updates and at the same time keep the records sorted! In databases we use the B+-tree (multiway search tree) 3rd Use a hash table. Much faster for exact match queries… but cannot support Range queries. (Also, special hashing schemes are needed for dynamic data) 1.27 1.28 180 200 150 156 179 120 130 100 101 110 30 35 3 5 11 180 150 100 30 120 B+Tree Example B=4 Root Database Architecture (data retrieval) DB Programmer User Code w/ embedded queries DBA Query DDL Commands Query Optimizer DML Precompiler Query Evaluator Query Processor File Manager Storage Manager Buffer Manager Secondary Storage Indices Data Statistics Metadata Schema 1.29 DDL Interpreter Database Systems  Data Modeling  Levels of Abstraction  Data Retrieval  Data Modification/Integrity Maintenance 1.30 Data Integrity Transaction processing (Ch 15, 16)  Why Concurrent Access to Data must be Managed? John and Jane withdraw $50 and $100 from a common account… John: 1. get balance 2. if balance > $50 3. balance = balance - $50 4. update balance Jane: 1. get balance 2. if balance > $100 3. balance = balance - $100 4. update balance Initial balance $300. Final balance=? It depends… 1.31 Data Integrity Recovery (Ch 17) Transfer $50 from account A ($100) to account B ($200) 1. get balance for A 2. If balanceA > $50 3. balanceA = balanceA – 50 4.Update balanceA in database 5. Get balance for B System crashes…. 6. balanceB = balanceB + 50 7. Update balanceB in database Recovery management 1.32 Database Architecture DB Programmer DBA User Code w/ embedded queries DDL Commands Query Query Optimizer DML Precompiler Query Evaluator Query Processor File Manager Transaction Manager Recovery Manager Buffer Manager Storage Manager Secondary Storage DDL Interpreter Indices Data Metadata Integrity Constraints Statistics Schema 1.33 Outline  1st half of the course: application-oriented  How to develop database applications: User + DBA  2nd part of the course: system-oriented  Learn the internals of a relational DBMS (Oracle..)  Last few lectures on Oracle-specific features such as XDB…. 1.34
 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                            