Download Database Design

Document related concepts

IMDb wikipedia , lookup

Oracle Database wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Concurrency control wikipedia , lookup

Database wikipedia , lookup

ContactPoint wikipedia , lookup

Clusterpoint wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
Lecture 13: Intro to Database Design
SIMS 202:
Information Organization
and Retrieval
Prof. Ray Larson & Prof. Marc Davis
UC Berkeley SIMS
Tuesday and Thursday 10:30 am - 12:00 pm
Fall 2002
http://www.sims.berkeley.edu/academics/courses/is202/f02/
IS 202 – FALL 2002
2002.10.14 - SLIDE 1
Lecture Overview
• Photo Project Feedback and Assignment 6
Discussion
• Review
– Metadata And Markup
– XML DTD Construction
– XML For Protocols And Metadata Languages
•
•
•
•
Databases and Database Design
Database Life Cycle
ER Diagrams
Database Design
IS 202 – FALL 2002
2002.10.14 - SLIDE 2
Lecture Overview
• Photo Project Feedback and Assignment 6
Discussion
• Review
– Metadata And Markup
– XML DTD Construction
– XML For Protocols And Metadata Languages
•
•
•
•
Databases and Database Design
Database Life Cycle
ER Diagrams
Database Design
IS 202 – FALL 2002
2002.10.14 - SLIDE 3
Photo Metadata Matters
"Unlike people's recollections, photographs don't change. They don't lie." — Bill Simon
IS 202 – FALL 2002
2002.10.14 - SLIDE 4
Photo Project Feedback
IS202 PHOTO PROJECT STUDENT FEEDBACK
Learned
Came to understand difficulty of classifying
Group experience was useful
Learned a lot
Assignments
Need better overview of whole project at the beginning
Assignments need to be clearer
Assignments were too time-pressed
Need more clarity about classification practices and principles
Need more discussion of assignments in class
Show examples of other classification schemes
Offer help about group process
Groups
Liked reorg into facet groups
Need smaller facet groups
Frustrated with consolidation process
Need shared folders for facet groups
0
IS 202 – FALL 2002
5
10
15
20
25
2002.10.14 - SLIDE 5
30
Where We Are Headed
• 450+ photos annotated in our consolidated
metadata classification
• Searchable from SIMS web site in the
Flamenco Browser
• Hopefully, some project teams will
implement their applications as well
– If not in 202, then in future SIMS projects
IS 202 – FALL 2002
2002.10.14 - SLIDE 6
Consolidated Photo Browser
http://fusion.sims.berkeley.edu/photo_project/photodatabase.cfm
IS 202 – FALL 2002
2002.10.14 - SLIDE 7
Flamenco Image Search
IS 202 – FALL 2002
2002.10.14 - SLIDE 8
Flamenco Image Search
IS 202 – FALL 2002
2002.10.14 - SLIDE 9
Assignment 6 Discussion
• Procedure for requesting additions to the
consolidated classification (Monday
through Wednesday only)
• Procedure for facet groups to recommend
additions to the consolidated classification
(Thursday through Friday)
• Procedure for the facet oversight group to
decide additions to the consolidated
classification (Friday through Monday)
IS 202 – FALL 2002
2002.10.14 - SLIDE 10
Photo Project Name Choices
•
•
•
•
•
•
•
•
•
•
SIMS Snapshot
Digital Shoebox
Photo Pigeonhole
Pigeonhole
ImageKey
Picture Yourself
Pictures on the Wall
Distant Camera
Memory to Spare
Memories to Spare
IS 202 – FALL 2002
2002.10.14 - SLIDE 11
Lecture Overview
• Photo Project Feedback and Assignment 6
Discussion
• Review
– Metadata And Markup
– XML DTD Construction
– XML For Protocols And Metadata Languages
•
•
•
•
Databases and Database Design
Database Life Cycle
ER Diagrams
Database Design
IS 202 – FALL 2002
2002.10.14 - SLIDE 12
SGML/XML Structure
• An SGML document consists of three
parts:
– The SGML Declaration
– The Document Type Definition (DTD)
– The Document Instance
• An XML document REQUIRES only the
document instance, but for effective
processing a DTD is very important
• XML Schema provides an alternative to
DTDs for XML applications
IS 202 – FALL 2002
2002.10.14 - SLIDE 13
DTD Components
• The major components of a DTD are:
– Entity Declarations
– Element Declarations
– Attribute Declarations
IS 202 – FALL 2002
2002.10.14 - SLIDE 14
Lecture Overview
• Photo Project Feedback and Assignment 6
Discussion
• Review
– Metadata And Markup
– XML DTD Construction
– XML For Protocols And Metadata Languages
•
•
•
•
Databases and Database Design
Database Life Cycle
ER Diagrams
Database Design
IS 202 – FALL 2002
2002.10.14 - SLIDE 15
What is a Database?
IS 202 – FALL 2002
2002.10.14 - SLIDE 16
Files and Databases
• File: A collection of records or documents
dealing with one organization, person,
area or subject (Rowley)
– Manual (paper) files
– Computer files
• Database: A collection of similar records
with relationships between the records
(Rowley)
– Bibliographic, statistical, business data,
images, etc.
IS 202 – FALL 2002
2002.10.14 - SLIDE 17
Database
• A Database is a collection of stored
operational data used by the application
systems of some particular enterprise
(C.J. Date)
– Paper “Databases”
• Still contain a large portion of the world’s
knowledge
– File-Based Data Processing Systems
• Early batch processing of (primarily) business data
– Database Management Systems (DBMS)
IS 202 – FALL 2002
2002.10.14 - SLIDE 18
Why DBMS?
• History
– 50’s and 60’s all applications were custom
built for particular needs
– File based
– Many similar/duplicative applications dealing
with collections of business data
– Early DBMS were extensions of programming
languages
– 1970 - E.F. Codd and the Relational Model
– 1979 - Ashton-Tate and first Microcomputer
DBMS
IS 202 – FALL 2002
2002.10.14 - SLIDE 19
File Based Systems
Application
Delivery
List
Coal
Estimation
Just what
asked for
IS 202 – FALL 2002
File
Toys
Addresses
Naughty
Nice Toys
2002.10.14 - SLIDE 20
From File Systems to DBMS
• Problems with file processing systems
– Inconsistent data
– Inflexibility
– Limited data sharing
– Poor enforcement of standards
– Excessive program maintenance
IS 202 – FALL 2002
2002.10.14 - SLIDE 21
DBMS Benefits
•
•
•
•
•
•
•
•
•
Minimal data redundancy
Consistency of data
Integration of data
Sharing of data
Ease of application development
Uniform security, privacy, and integrity controls
Data accessibility and responsiveness
Data independence
Reduced program maintenance
IS 202 – FALL 2002
2002.10.14 - SLIDE 22
Terms and Concepts
• Data independence
– Physical representation and location of data
and the use of that data are separated
• The application doesn’t need to know how or
where the database has stored the data, but just
how to ask for it
• Moving a database from one DBMS to another
should not have a material effect on application
program
• Recoding, adding fields, etc. in the database
should not affect applications
IS 202 – FALL 2002
2002.10.14 - SLIDE 23
Database Environment
CASE
Tools
Repository
IS 202 – FALL 2002
User
Interface
DBMS
Application
Programs
Database
2002.10.14 - SLIDE 24
Database Components
DBMS
===============
Design tools
Database
Database contains:
User’s Data
Metadata
Indexes
Application Metadata
IS 202 – FALL 2002
Table Creation
Form Creation
Query Creation
Report Creation
Procedural
language
compiler (4GL)
=============
Run time
Form processor
Query processor
Report Writer
Language Run time
Application
Programs
User
Interface
Applications
2002.10.14 - SLIDE 25
Types of Database Systems
•
•
•
•
•
PC databases
Centralized database
Client/server databases
Distributed databases
Database models
IS 202 – FALL 2002
2002.10.14 - SLIDE 26
PC Databases
E.g.:
Access
FoxPro
Dbase
Etc.
IS 202 – FALL 2002
2002.10.14 - SLIDE 27
Centralized Databases
Central
Computer
IS 202 – FALL 2002
2002.10.14 - SLIDE 28
Client Server Databases
Client
Client
Network
Database
Server
Client
IS 202 – FALL 2002
2002.10.14 - SLIDE 29
Distributed Databases
Location C
Location B
computer
computer
computer
Homogeneous
Databases
Location A
IS 202 – FALL 2002
2002.10.14 - SLIDE 30
Distributed Databases
Client
Heterogeneous
Or Federated
Databases
Database
Server
Remote
Comp.
Local Network
Comm
Server
Client
IS 202 – FALL 2002
Remote
Comp.
2002.10.14 - SLIDE 31
Terms and Concepts
• Database application
– An application program (or set of related
programs) that is used to perform a series of
database activities:
•
•
•
•
Create
Read
Update
Delete
On behalf of database users
IS 202 – FALL 2002
2002.10.14 - SLIDE 32
Terms and Concepts
• Database activities:
– Create
• Add new data to the database
– Read
• Read current data from the database
– Update
• Update or modify current database data
– Delete
• Remove current data from the database
IS 202 – FALL 2002
2002.10.14 - SLIDE 33
Terms and Concepts
• Enterprise
– Organization
• Entity
– Person, Place, Thing, Event, Concept...
• Attributes
– Data elements (facts) about some entity
– Also sometimes called fields or items or domains
• Data values
– Instances of a particular attribute for a particular
entity
IS 202 – FALL 2002
2002.10.14 - SLIDE 34
Terms and Concepts
• Records
– The set of values for all attributes of a
particular entity
– AKA “tuples” or “rows” in relational DBMS
• File
– Collection of records
– AKA “Relation” or “Table” in relational DBMS
IS 202 – FALL 2002
2002.10.14 - SLIDE 35
Terms and Concepts
• Key
– An attribute or set of attributes used to identify
or locate records in a file
• Primary Key
– An attribute or set of attributes that uniquely
identifies each record in a file
IS 202 – FALL 2002
2002.10.14 - SLIDE 36
Terms and Concepts
• Models
– (1) Levels or views of the Database
• Conceptual, logical, physical
– (2) DBMS types
• Relational, Hierarchic, Network, Object-Oriented,
Object-Relational
IS 202 – FALL 2002
2002.10.14 - SLIDE 37
Models (1)
Application 1
External
Model
Application 2
Application 3
Application 4
External
Model
External
Model
External
Model
Application 1
Conceptual
requirements
Application 2
Conceptual
requirements
Application 3
Conceptual
requirements
Conceptual
Model
Logical
Model
Internal
Model
Application 4
Conceptual
requirements
IS 202 – FALL 2002
2002.10.14 - SLIDE 38
Data Models(2): History
• Hierarchical Model (1960’s and 1970’s)
– Similar to data structures in programming
languages
Books
(id, title)
Authors
(first, last)
IS 202 – FALL 2002
Publisher
Subjects
2002.10.14 - SLIDE 39
Data Models(2): History
• Network Model (1970’s)
– Provides for single entries of data and
navigational “links” through chains of data.
Authors
Subjects
Books
Publishers
IS 202 – FALL 2002
2002.10.14 - SLIDE 40
Data Models(2): History
• Relational Model (1980’s)
– Provides a conceptually simple model for data
as relations (typically considered “tables”) with
all data visible
pubid
Book ID
1
2
3
4
5
Title
pubid
Introductio
The history
New stuff ab
Another title
And yet more
IS 202 – FALL 2002
2
4
3
2
1
Author id
1
2
3
4
5
1
2
3
4
Book ID
pubname
Harper
Addison
Oxford
Que
Authorid
1
2
3
4
5
Author name
Smith
Wynar
Jones
Duncan
Applegate
Subid
1
2
3
4
4
2
1
3
2
3
Subid
Subject
1 cataloging
2 history
3 stuff
2002.10.14 - SLIDE 41
Data Models(2): History
• Object Oriented Data Model (1990’s)
– Encapsulates data and operations as
“Objects”
Books
(id, title)
Authors
(first, last)
IS 202 – FALL 2002
Publisher
Subjects
2002.10.14 - SLIDE 42
Data Models(2): History
• Object-Relational Model (1990’s)
– Combines the well-known properties of the
Relational Model with such OO features as:
• User-defined datatypes
• User-defined functions
• Inheritance and sub-classing
IS 202 – FALL 2002
2002.10.14 - SLIDE 43
Lecture Overview
• Photo Project Feedback and Assignment 6
Discussion
• Review
– Metadata And Markup
– XML DTD Construction
– XML For Protocols And Metadata Languages
•
•
•
•
Databases and Database Design
Database Life Cycle
ER Diagrams
Database Design
IS 202 – FALL 2002
2002.10.14 - SLIDE 44
Database System Life Cycle
Physical
Creation
2
Conversion
3
Design
1
Growth,
Change, &
Maintenance
6
Integration
4
Operations
5
IS 202 – FALL 2002
2002.10.14 - SLIDE 45
Design
• Determination of the needs of the
organization
• Development of the Conceptual Model of
the database
– Typically using Entity-Relationship
diagramming techniques
• Construction of a Data Dictionary
• Development of the Logical Model
IS 202 – FALL 2002
2002.10.14 - SLIDE 46
Physical Creation
• Development of the Physical Model of the
Database
– Data formats and types
– Determination of indexes, etc.
• Load a prototype database and test
• Determine and implement security, privacy
and access controls
• Determine and implement integrity
constraints
IS 202 – FALL 2002
2002.10.14 - SLIDE 47
Conversion
• Convert existing data sets and
applications to use the new database
– May need programs, conversion utilities to
convert old data to new formats
IS 202 – FALL 2002
2002.10.14 - SLIDE 48
Integration
• Overlaps with Phase 3
• Integration of converted applications and
new applications into the new database
IS 202 – FALL 2002
2002.10.14 - SLIDE 49
Operations
• All applications run full-scale
• Privacy, security, access control must be in
place
• Recovery and Backup procedures must be
established and used
IS 202 – FALL 2002
2002.10.14 - SLIDE 50
Growth, Change, and Maintenance
• Change is a way of life
– Applications, data requirements, reports, etc.
will all change as new needs and
requirements are found
– The Database and applications and will need
to be modified to meet the needs of changes
IS 202 – FALL 2002
2002.10.14 - SLIDE 51
Another View of the Life Cycle
Integration
4
Operations
5
Design
Physical
1
Creation Conversion Growth,
2
Change
3
6
IS 202 – FALL 2002
2002.10.14 - SLIDE 52
Lecture Overview
• Photo Project Feedback and Assignment 6
Discussion
• Review
– Metadata And Markup
– XML DTD Construction
– XML For Protocols And Metadata Languages
•
•
•
•
Databases and Database Design
Database Life Cycle
ER Diagrams
Database Design
IS 202 – FALL 2002
2002.10.14 - SLIDE 53
Database Design Process
Application 1
External
Model
Application 2
Application 3
Application 4
External
Model
External
Model
External
Model
Application 1
Conceptual
requirements
Application 2
Conceptual
requirements
Application 3
Conceptual
requirements
Conceptual
Model
Logical
Model
Internal
Model
Application 4
Conceptual
requirements
IS 202 – FALL 2002
2002.10.14 - SLIDE 54
Entity
• An Entity is an object in the real world (or
even imaginary worlds) about which we
want or need to maintain information
– Persons (e.g.: customers in a business,
employees, authors)
– Things (e.g.: purchase orders, meetings,
parts, companies)
Employee
IS 202 – FALL 2002
2002.10.14 - SLIDE 55
Attributes
• Attributes are the significant properties or
characteristics of an entity that help identify it
and provide the information needed to
interact with it or use it (this is the Metadata
for the entities)
Birthdate
First
Middle
Last
IS 202 – FALL 2002
Age
Name
Employee
SSN
Projects
2002.10.14 - SLIDE 56
Relationships
• Relationships are the associations
between entities
• They can involve one or more entities and
belong to particular relationship types
IS 202 – FALL 2002
2002.10.14 - SLIDE 57
Relationships
Student
Attends
Class
Project
Supplier
IS 202 – FALL 2002
Supplies
project
parts
Part
2002.10.14 - SLIDE 58
Types of Relationships
• Concerned only with cardinality of
relationship
Employee
Employee
Employee
1 Assigned
n
Assigned
1
1
m Assigned n
Truck
Project
Project
Chen ER notation
IS 202 – FALL 2002
2002.10.14 - SLIDE 59
Other Notations
Employee
Assigned
Truck
Employee
Assigned
Project
Employee
Assigned
Project
“Crow’s Foot”
IS 202 – FALL 2002
2002.10.14 - SLIDE 60
Other Notations
Employee
Assigned
Truck
Employee
Assigned
Project
Employee
Assigned
Project
IDEFIX Notation
IS 202 – FALL 2002
2002.10.14 - SLIDE 61
More Complex Relationships
Manager
1/1/1
Employee
1/n/n Evaluation n/n/1
Project
SSN
Date
Project
Employee
4(2-10)
Assigned
1
Manages
Employee
Is Managed By
Project
1
Manages
n
IS 202 – FALL 2002
2002.10.14 - SLIDE 62
Weak Entities
• Owe existence entirely to another entity
Part#
Invoice #
Order
Invoice#
Contains
Quantity
Order-line
Rep#
IS 202 – FALL 2002
2002.10.14 - SLIDE 63
Supertype and Subtype Entities
Employee
Sales-rep
Is one of
Manages
Clerk
Sold
Other
Invoice
IS 202 – FALL 2002
2002.10.14 - SLIDE 64
Many to Many Relationships
SSN
Proj#
Proj#
Hours
Project
Assignment
Is
Assigned
Project
Assigned
Employee
IS 202 – FALL 2002
SSN
2002.10.14 - SLIDE 65
Lecture Overview
• Photo Project Feedback and Assignment 6
Discussion
• Review
– Metadata And Markup
– XML DTD Construction
– XML For Protocols And Metadata Languages
•
•
•
•
Databases and Database Design
Database Life Cycle
ER Diagrams
Database Design
IS 202 – FALL 2002
2002.10.14 - SLIDE 66
Database Design Process
Application 1
External
Model
Application 2
Application 3
Application 4
External
Model
External
Model
External
Model
Application 1
Conceptual
requirements
Application 2
Conceptual
requirements
Application 3
Conceptual
requirements
Conceptual
Model
Logical
Model
Internal
Model
Application 4
Conceptual
requirements
IS 202 – FALL 2002
2002.10.14 - SLIDE 67
Requirements Analysis
• Conceptual Requirements
– Systems Analysis Process
• Examine all of the information sources used in
existing applications
• Identify the characteristics of each data element
–
–
–
–
Numeric
Text
Date/time
Etc.
• Examine the tasks carried out using the
information
• Examine results or reports created using the
information
IS 202 – FALL 2002
2002.10.14 - SLIDE 68
Conceptual Design
• Conceptual Model
– Merge the collective needs of all applications
– Determine what Entities are being used
• Some object about which information is to
maintained
– What are the Attributes of those entities?
• Properties or characteristics of the entity
• What attributes uniquely identify the entity
– What are the Relationships between entities
• How the entities interact with each other?
IS 202 – FALL 2002
2002.10.14 - SLIDE 69
Developing a Conceptual Model
• Overall view of the database that integrates all
the needed information discovered during the
requirements analysis
• Elements of the Conceptual Model are
represented by diagrams, Entity-Relationship or
ER Diagrams, that show the meanings and
relationships of those elements independent of
any particular database systems or
implementation details
• Can also be represented using other modeling
tools (such as UML)
IS 202 – FALL 2002
2002.10.14 - SLIDE 70
Logical Design
• Logical Model
– How is each entity and relationship
represented in the Data Model of the DBMS
•
•
•
•
Hierarchic?
Network?
Relational?
Object-Oriented?
IS 202 – FALL 2002
2002.10.14 - SLIDE 71
Physical Design
• Internal Model
– Choices of index file structure
– Choices of data storage formats
– Choices of disk layout
IS 202 – FALL 2002
2002.10.14 - SLIDE 72
Database Application Design
• External Model
– User views of the integrated database
– Making the old (or updated) applications work
with the new database design
IS 202 – FALL 2002
2002.10.14 - SLIDE 73