* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Database Design
Oracle Database wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Concurrency control wikipedia , lookup
ContactPoint wikipedia , lookup
Clusterpoint wikipedia , lookup
Lecture 13: Intro to Database Design SIMS 202: Information Organization and Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002 http://www.sims.berkeley.edu/academics/courses/is202/f02/ IS 202 – FALL 2002 2002.10.14 - SLIDE 1 Lecture Overview • Photo Project Feedback and Assignment 6 Discussion • Review – Metadata And Markup – XML DTD Construction – XML For Protocols And Metadata Languages • • • • Databases and Database Design Database Life Cycle ER Diagrams Database Design IS 202 – FALL 2002 2002.10.14 - SLIDE 2 Lecture Overview • Photo Project Feedback and Assignment 6 Discussion • Review – Metadata And Markup – XML DTD Construction – XML For Protocols And Metadata Languages • • • • Databases and Database Design Database Life Cycle ER Diagrams Database Design IS 202 – FALL 2002 2002.10.14 - SLIDE 3 Photo Metadata Matters "Unlike people's recollections, photographs don't change. They don't lie." — Bill Simon IS 202 – FALL 2002 2002.10.14 - SLIDE 4 Photo Project Feedback IS202 PHOTO PROJECT STUDENT FEEDBACK Learned Came to understand difficulty of classifying Group experience was useful Learned a lot Assignments Need better overview of whole project at the beginning Assignments need to be clearer Assignments were too time-pressed Need more clarity about classification practices and principles Need more discussion of assignments in class Show examples of other classification schemes Offer help about group process Groups Liked reorg into facet groups Need smaller facet groups Frustrated with consolidation process Need shared folders for facet groups 0 IS 202 – FALL 2002 5 10 15 20 25 2002.10.14 - SLIDE 5 30 Where We Are Headed • 450+ photos annotated in our consolidated metadata classification • Searchable from SIMS web site in the Flamenco Browser • Hopefully, some project teams will implement their applications as well – If not in 202, then in future SIMS projects IS 202 – FALL 2002 2002.10.14 - SLIDE 6 Consolidated Photo Browser http://fusion.sims.berkeley.edu/photo_project/photodatabase.cfm IS 202 – FALL 2002 2002.10.14 - SLIDE 7 Flamenco Image Search IS 202 – FALL 2002 2002.10.14 - SLIDE 8 Flamenco Image Search IS 202 – FALL 2002 2002.10.14 - SLIDE 9 Assignment 6 Discussion • Procedure for requesting additions to the consolidated classification (Monday through Wednesday only) • Procedure for facet groups to recommend additions to the consolidated classification (Thursday through Friday) • Procedure for the facet oversight group to decide additions to the consolidated classification (Friday through Monday) IS 202 – FALL 2002 2002.10.14 - SLIDE 10 Photo Project Name Choices • • • • • • • • • • SIMS Snapshot Digital Shoebox Photo Pigeonhole Pigeonhole ImageKey Picture Yourself Pictures on the Wall Distant Camera Memory to Spare Memories to Spare IS 202 – FALL 2002 2002.10.14 - SLIDE 11 Lecture Overview • Photo Project Feedback and Assignment 6 Discussion • Review – Metadata And Markup – XML DTD Construction – XML For Protocols And Metadata Languages • • • • Databases and Database Design Database Life Cycle ER Diagrams Database Design IS 202 – FALL 2002 2002.10.14 - SLIDE 12 SGML/XML Structure • An SGML document consists of three parts: – The SGML Declaration – The Document Type Definition (DTD) – The Document Instance • An XML document REQUIRES only the document instance, but for effective processing a DTD is very important • XML Schema provides an alternative to DTDs for XML applications IS 202 – FALL 2002 2002.10.14 - SLIDE 13 DTD Components • The major components of a DTD are: – Entity Declarations – Element Declarations – Attribute Declarations IS 202 – FALL 2002 2002.10.14 - SLIDE 14 Lecture Overview • Photo Project Feedback and Assignment 6 Discussion • Review – Metadata And Markup – XML DTD Construction – XML For Protocols And Metadata Languages • • • • Databases and Database Design Database Life Cycle ER Diagrams Database Design IS 202 – FALL 2002 2002.10.14 - SLIDE 15 What is a Database? IS 202 – FALL 2002 2002.10.14 - SLIDE 16 Files and Databases • File: A collection of records or documents dealing with one organization, person, area or subject (Rowley) – Manual (paper) files – Computer files • Database: A collection of similar records with relationships between the records (Rowley) – Bibliographic, statistical, business data, images, etc. IS 202 – FALL 2002 2002.10.14 - SLIDE 17 Database • A Database is a collection of stored operational data used by the application systems of some particular enterprise (C.J. Date) – Paper “Databases” • Still contain a large portion of the world’s knowledge – File-Based Data Processing Systems • Early batch processing of (primarily) business data – Database Management Systems (DBMS) IS 202 – FALL 2002 2002.10.14 - SLIDE 18 Why DBMS? • History – 50’s and 60’s all applications were custom built for particular needs – File based – Many similar/duplicative applications dealing with collections of business data – Early DBMS were extensions of programming languages – 1970 - E.F. Codd and the Relational Model – 1979 - Ashton-Tate and first Microcomputer DBMS IS 202 – FALL 2002 2002.10.14 - SLIDE 19 File Based Systems Application Delivery List Coal Estimation Just what asked for IS 202 – FALL 2002 File Toys Addresses Naughty Nice Toys 2002.10.14 - SLIDE 20 From File Systems to DBMS • Problems with file processing systems – Inconsistent data – Inflexibility – Limited data sharing – Poor enforcement of standards – Excessive program maintenance IS 202 – FALL 2002 2002.10.14 - SLIDE 21 DBMS Benefits • • • • • • • • • Minimal data redundancy Consistency of data Integration of data Sharing of data Ease of application development Uniform security, privacy, and integrity controls Data accessibility and responsiveness Data independence Reduced program maintenance IS 202 – FALL 2002 2002.10.14 - SLIDE 22 Terms and Concepts • Data independence – Physical representation and location of data and the use of that data are separated • The application doesn’t need to know how or where the database has stored the data, but just how to ask for it • Moving a database from one DBMS to another should not have a material effect on application program • Recoding, adding fields, etc. in the database should not affect applications IS 202 – FALL 2002 2002.10.14 - SLIDE 23 Database Environment CASE Tools Repository IS 202 – FALL 2002 User Interface DBMS Application Programs Database 2002.10.14 - SLIDE 24 Database Components DBMS =============== Design tools Database Database contains: User’s Data Metadata Indexes Application Metadata IS 202 – FALL 2002 Table Creation Form Creation Query Creation Report Creation Procedural language compiler (4GL) ============= Run time Form processor Query processor Report Writer Language Run time Application Programs User Interface Applications 2002.10.14 - SLIDE 25 Types of Database Systems • • • • • PC databases Centralized database Client/server databases Distributed databases Database models IS 202 – FALL 2002 2002.10.14 - SLIDE 26 PC Databases E.g.: Access FoxPro Dbase Etc. IS 202 – FALL 2002 2002.10.14 - SLIDE 27 Centralized Databases Central Computer IS 202 – FALL 2002 2002.10.14 - SLIDE 28 Client Server Databases Client Client Network Database Server Client IS 202 – FALL 2002 2002.10.14 - SLIDE 29 Distributed Databases Location C Location B computer computer computer Homogeneous Databases Location A IS 202 – FALL 2002 2002.10.14 - SLIDE 30 Distributed Databases Client Heterogeneous Or Federated Databases Database Server Remote Comp. Local Network Comm Server Client IS 202 – FALL 2002 Remote Comp. 2002.10.14 - SLIDE 31 Terms and Concepts • Database application – An application program (or set of related programs) that is used to perform a series of database activities: • • • • Create Read Update Delete On behalf of database users IS 202 – FALL 2002 2002.10.14 - SLIDE 32 Terms and Concepts • Database activities: – Create • Add new data to the database – Read • Read current data from the database – Update • Update or modify current database data – Delete • Remove current data from the database IS 202 – FALL 2002 2002.10.14 - SLIDE 33 Terms and Concepts • Enterprise – Organization • Entity – Person, Place, Thing, Event, Concept... • Attributes – Data elements (facts) about some entity – Also sometimes called fields or items or domains • Data values – Instances of a particular attribute for a particular entity IS 202 – FALL 2002 2002.10.14 - SLIDE 34 Terms and Concepts • Records – The set of values for all attributes of a particular entity – AKA “tuples” or “rows” in relational DBMS • File – Collection of records – AKA “Relation” or “Table” in relational DBMS IS 202 – FALL 2002 2002.10.14 - SLIDE 35 Terms and Concepts • Key – An attribute or set of attributes used to identify or locate records in a file • Primary Key – An attribute or set of attributes that uniquely identifies each record in a file IS 202 – FALL 2002 2002.10.14 - SLIDE 36 Terms and Concepts • Models – (1) Levels or views of the Database • Conceptual, logical, physical – (2) DBMS types • Relational, Hierarchic, Network, Object-Oriented, Object-Relational IS 202 – FALL 2002 2002.10.14 - SLIDE 37 Models (1) Application 1 External Model Application 2 Application 3 Application 4 External Model External Model External Model Application 1 Conceptual requirements Application 2 Conceptual requirements Application 3 Conceptual requirements Conceptual Model Logical Model Internal Model Application 4 Conceptual requirements IS 202 – FALL 2002 2002.10.14 - SLIDE 38 Data Models(2): History • Hierarchical Model (1960’s and 1970’s) – Similar to data structures in programming languages Books (id, title) Authors (first, last) IS 202 – FALL 2002 Publisher Subjects 2002.10.14 - SLIDE 39 Data Models(2): History • Network Model (1970’s) – Provides for single entries of data and navigational “links” through chains of data. Authors Subjects Books Publishers IS 202 – FALL 2002 2002.10.14 - SLIDE 40 Data Models(2): History • Relational Model (1980’s) – Provides a conceptually simple model for data as relations (typically considered “tables”) with all data visible pubid Book ID 1 2 3 4 5 Title pubid Introductio The history New stuff ab Another title And yet more IS 202 – FALL 2002 2 4 3 2 1 Author id 1 2 3 4 5 1 2 3 4 Book ID pubname Harper Addison Oxford Que Authorid 1 2 3 4 5 Author name Smith Wynar Jones Duncan Applegate Subid 1 2 3 4 4 2 1 3 2 3 Subid Subject 1 cataloging 2 history 3 stuff 2002.10.14 - SLIDE 41 Data Models(2): History • Object Oriented Data Model (1990’s) – Encapsulates data and operations as “Objects” Books (id, title) Authors (first, last) IS 202 – FALL 2002 Publisher Subjects 2002.10.14 - SLIDE 42 Data Models(2): History • Object-Relational Model (1990’s) – Combines the well-known properties of the Relational Model with such OO features as: • User-defined datatypes • User-defined functions • Inheritance and sub-classing IS 202 – FALL 2002 2002.10.14 - SLIDE 43 Lecture Overview • Photo Project Feedback and Assignment 6 Discussion • Review – Metadata And Markup – XML DTD Construction – XML For Protocols And Metadata Languages • • • • Databases and Database Design Database Life Cycle ER Diagrams Database Design IS 202 – FALL 2002 2002.10.14 - SLIDE 44 Database System Life Cycle Physical Creation 2 Conversion 3 Design 1 Growth, Change, & Maintenance 6 Integration 4 Operations 5 IS 202 – FALL 2002 2002.10.14 - SLIDE 45 Design • Determination of the needs of the organization • Development of the Conceptual Model of the database – Typically using Entity-Relationship diagramming techniques • Construction of a Data Dictionary • Development of the Logical Model IS 202 – FALL 2002 2002.10.14 - SLIDE 46 Physical Creation • Development of the Physical Model of the Database – Data formats and types – Determination of indexes, etc. • Load a prototype database and test • Determine and implement security, privacy and access controls • Determine and implement integrity constraints IS 202 – FALL 2002 2002.10.14 - SLIDE 47 Conversion • Convert existing data sets and applications to use the new database – May need programs, conversion utilities to convert old data to new formats IS 202 – FALL 2002 2002.10.14 - SLIDE 48 Integration • Overlaps with Phase 3 • Integration of converted applications and new applications into the new database IS 202 – FALL 2002 2002.10.14 - SLIDE 49 Operations • All applications run full-scale • Privacy, security, access control must be in place • Recovery and Backup procedures must be established and used IS 202 – FALL 2002 2002.10.14 - SLIDE 50 Growth, Change, and Maintenance • Change is a way of life – Applications, data requirements, reports, etc. will all change as new needs and requirements are found – The Database and applications and will need to be modified to meet the needs of changes IS 202 – FALL 2002 2002.10.14 - SLIDE 51 Another View of the Life Cycle Integration 4 Operations 5 Design Physical 1 Creation Conversion Growth, 2 Change 3 6 IS 202 – FALL 2002 2002.10.14 - SLIDE 52 Lecture Overview • Photo Project Feedback and Assignment 6 Discussion • Review – Metadata And Markup – XML DTD Construction – XML For Protocols And Metadata Languages • • • • Databases and Database Design Database Life Cycle ER Diagrams Database Design IS 202 – FALL 2002 2002.10.14 - SLIDE 53 Database Design Process Application 1 External Model Application 2 Application 3 Application 4 External Model External Model External Model Application 1 Conceptual requirements Application 2 Conceptual requirements Application 3 Conceptual requirements Conceptual Model Logical Model Internal Model Application 4 Conceptual requirements IS 202 – FALL 2002 2002.10.14 - SLIDE 54 Entity • An Entity is an object in the real world (or even imaginary worlds) about which we want or need to maintain information – Persons (e.g.: customers in a business, employees, authors) – Things (e.g.: purchase orders, meetings, parts, companies) Employee IS 202 – FALL 2002 2002.10.14 - SLIDE 55 Attributes • Attributes are the significant properties or characteristics of an entity that help identify it and provide the information needed to interact with it or use it (this is the Metadata for the entities) Birthdate First Middle Last IS 202 – FALL 2002 Age Name Employee SSN Projects 2002.10.14 - SLIDE 56 Relationships • Relationships are the associations between entities • They can involve one or more entities and belong to particular relationship types IS 202 – FALL 2002 2002.10.14 - SLIDE 57 Relationships Student Attends Class Project Supplier IS 202 – FALL 2002 Supplies project parts Part 2002.10.14 - SLIDE 58 Types of Relationships • Concerned only with cardinality of relationship Employee Employee Employee 1 Assigned n Assigned 1 1 m Assigned n Truck Project Project Chen ER notation IS 202 – FALL 2002 2002.10.14 - SLIDE 59 Other Notations Employee Assigned Truck Employee Assigned Project Employee Assigned Project “Crow’s Foot” IS 202 – FALL 2002 2002.10.14 - SLIDE 60 Other Notations Employee Assigned Truck Employee Assigned Project Employee Assigned Project IDEFIX Notation IS 202 – FALL 2002 2002.10.14 - SLIDE 61 More Complex Relationships Manager 1/1/1 Employee 1/n/n Evaluation n/n/1 Project SSN Date Project Employee 4(2-10) Assigned 1 Manages Employee Is Managed By Project 1 Manages n IS 202 – FALL 2002 2002.10.14 - SLIDE 62 Weak Entities • Owe existence entirely to another entity Part# Invoice # Order Invoice# Contains Quantity Order-line Rep# IS 202 – FALL 2002 2002.10.14 - SLIDE 63 Supertype and Subtype Entities Employee Sales-rep Is one of Manages Clerk Sold Other Invoice IS 202 – FALL 2002 2002.10.14 - SLIDE 64 Many to Many Relationships SSN Proj# Proj# Hours Project Assignment Is Assigned Project Assigned Employee IS 202 – FALL 2002 SSN 2002.10.14 - SLIDE 65 Lecture Overview • Photo Project Feedback and Assignment 6 Discussion • Review – Metadata And Markup – XML DTD Construction – XML For Protocols And Metadata Languages • • • • Databases and Database Design Database Life Cycle ER Diagrams Database Design IS 202 – FALL 2002 2002.10.14 - SLIDE 66 Database Design Process Application 1 External Model Application 2 Application 3 Application 4 External Model External Model External Model Application 1 Conceptual requirements Application 2 Conceptual requirements Application 3 Conceptual requirements Conceptual Model Logical Model Internal Model Application 4 Conceptual requirements IS 202 – FALL 2002 2002.10.14 - SLIDE 67 Requirements Analysis • Conceptual Requirements – Systems Analysis Process • Examine all of the information sources used in existing applications • Identify the characteristics of each data element – – – – Numeric Text Date/time Etc. • Examine the tasks carried out using the information • Examine results or reports created using the information IS 202 – FALL 2002 2002.10.14 - SLIDE 68 Conceptual Design • Conceptual Model – Merge the collective needs of all applications – Determine what Entities are being used • Some object about which information is to maintained – What are the Attributes of those entities? • Properties or characteristics of the entity • What attributes uniquely identify the entity – What are the Relationships between entities • How the entities interact with each other? IS 202 – FALL 2002 2002.10.14 - SLIDE 69 Developing a Conceptual Model • Overall view of the database that integrates all the needed information discovered during the requirements analysis • Elements of the Conceptual Model are represented by diagrams, Entity-Relationship or ER Diagrams, that show the meanings and relationships of those elements independent of any particular database systems or implementation details • Can also be represented using other modeling tools (such as UML) IS 202 – FALL 2002 2002.10.14 - SLIDE 70 Logical Design • Logical Model – How is each entity and relationship represented in the Data Model of the DBMS • • • • Hierarchic? Network? Relational? Object-Oriented? IS 202 – FALL 2002 2002.10.14 - SLIDE 71 Physical Design • Internal Model – Choices of index file structure – Choices of data storage formats – Choices of disk layout IS 202 – FALL 2002 2002.10.14 - SLIDE 72 Database Application Design • External Model – User views of the integrated database – Making the old (or updated) applications work with the new database design IS 202 – FALL 2002 2002.10.14 - SLIDE 73