Download week 3 normalisation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Oracle Database wikipedia , lookup

Ingres (database) wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

Concurrency control wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

PL/SQL wikipedia , lookup

SQL wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Database wikipedia , lookup

ContactPoint wikipedia , lookup

Clusterpoint wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
RELATIONAL DATABASES
Normalisation
INTRODUCTION
 Last week we looked at elements of designing a database and
the generation of an ERD
 As part of the design and generation of an ERD there is an
iterative cycle of generating ERD, apply normalisation, adjust
ERD, check against business rules …. Etc.
 This week we will look at the process of normalising the data
 What normalisation is
 Why it is needed
 3NF and beyond
DATABASE NORMALISATION
 Process of restructuring the logical data model of a database
 The process of removing redundant data from tables
 Removes repeating data
 Improves storage efficiency, data integration and scalability
 Supported by the relational model
 The level of ef ficiency of a database is measured in normal
form (NF)
 Achieved through a process of applying a series of algorithms /
methods
 Generally involves splitting existing tables into multiple tables and
then re-connecting them through joins when a query is needed to pull
the data together.
THE HISTORY
 Proposed by Edgar F. Codd in the paper “A relational model of
data for large shared data banks”
“there is, in fact, a very simple elimination procedure which we shall call
normalisation. Through decomposition non-simple domains are replaced
by domains whose elements are atomic (non-decomposable) values”
 Normalising data is now standard in the relational database
world.
 It optimises both data input and retrieval and supports the relational
model
 NOT applied to the warehousing database or those that deviated from
the traditional relational model for implementation.
 Codd established 3 normal forms, other followed but 3NF is
considered sufficient for most applications
 BCNF
WHY NORMALISE?
 Non-normalised databases experience data anomalies
 May store data representing data in multiple locations, if data is
updated in some but not all locations an UPDATE ANOMALY will occur
 Normalised data stores data in one location and links via a FOREIGN KEY
 May have inappropriate dependencies. Adding data to this type of
database will require first adding unrelated dependency data
 Normalised data prevents such INSERTION ANOMALIES by ensuring a
database relation/record mirrors functional dependencies.
 May not be able to delete data without having to delete data you
don’t want to remove as all data is clumped together DELETION
ANOMALIES
 Normalisation uniquely identifies records through keys and no extraneous
information.
NORMAL FORMS
 De-normalised data is simply a list of the data elements in
one clump
 First normal form requires data be identified by a primary key
and a number of atomic values / attributes
 Second normal form and third normal forms deal with the
relationship of non-key attributes to the primary key
 Third normal form is classed as fully normalised and can be
‘tweaked’ to get to BCNF
 Forth and fifth normal forms deal specifically with the
representation of many to many and one to many
relationships
 Sixth normal form only applies to temporal databases.
ILLUSTRATION
Title
Author 1
Author2
ISBN
Subject
Pages
Publisher
Database Systems:
the complete book
Hector GarciaMolina
Jeffrey D Ullman
129202447X
Databases, Computers
1152
Pearson
Database Design
for mere mortals
Michael J
Hernadex
0321884493
Computers, Databases
672
Addison Wesley
SQL queries for
mere mortals
John L Viescas
0321444434
Databases, SQL
672
Addison Wesley
Michael J
Hernandex
 This table is not very ef ficient with storage (you need a
column/attribute for every author, some books have 4 or 5!)
 The design does not protect data integrity
 The table will not scale well
FIRST NORMAL FORM
 All data values should be atomic
 All column cells should have single values rather than composite
values or set of objects / values
Title
Author 1
Author2
ISBN
Subject
Pages
Publisher
Database Systems:
the complete book
Hector GarciaMolina
Jeffrey D Ullman
129202447X
Databases, Computers
1152
Pearson
Database Design
for mere mortals
Michael J
Hernadex
0321884493
Computers, Databases
672
Addison Wesley
SQL queries for
mere mortals
John L Viescas
0321444434
Databases, SQL
672
Addison Wesley
Michael J
Hernandex
FIRST NORMAL FORM (1NF)
 The 2 nd author attribute has been removed
 Duplicate row with different author to ensure data is not lost
 Duplicate the row for each subject classification
 Problems:
 INSERT ANOMALIES – cannot add a new Author without a Book etc.
 UPDATE ANOMALIES – cannot change 1 publisher for ‘Database design for
mere mortals’ we have to change 2 rows
 DELETE ANOMALIES – if we remove ‘SQL queries for mere mortals’ we
have to remove the SQL subject as well
Title
Author
ISBN
Subject
Pages
Publisher
Database Systems: the complete
book
Hector Garcia-Molina
129202447X
Databases,
1152
Pearson
Database Systems: the complete
book
Jeffrey D Ullman
129202447X
Computers
1152
Pearson
SQL queries for mere mortals
John L Viescas
0321444434
SQL
672
Addison Wesley
SQL queries for mere mortals
Michael J Hernandex
0321444434
Databases
672
Addison Wesley
Database Design for mere
mortals
Michael J Hernadex
0321884493
Databases
672
Addison Wesley
Database Design for mere
mortals
Michael J Hernadex
0321884493
Computers
672
Addison Wesley
2 records to
split the
Author
2 records to
split the
subject
SPLITTING THE TABLE - PROBLEMS
Title
Author 1
ISBN
Subject
Pages
Publisher
Database Systems:
the complete book
Hector GarciaMolina
129202447X
Databases,
1152
Pearson
Database Systems:
the complete book
Jeffrey D Ullman
129202447X
Computers
1152
Pearson
SQL queries for
mere mortals
John L Viescas
0321444434
SQL
672
Addison Wesley
SQL queries for
mere mortals
Michael Hernandex
0321444434
Databases
672
Addison Wesley
Database Design
for mere mortals
Michael J
Hernadex
0321884493
Databases
672
Addison Wesley
Database Design
for mere mortals
Michael J
Hernadex
0321884493
Computers
672
Addison Wesley
 The table above may be in 1 st NF but it violates 2 nd NF
 A better solution is to split the data into separate tables
 Author
 Subject
 Book
 Functional dependencies need to be considered.
FUNCTIONAL DEPENDENCIES
 Redundancy is caused by a functional dependency
 Functional dependency is a like between 2 sets of attributes
(tables/relations)
 Normalising to 2NF removes undesirable FD’s
 A set of attributes determining another
 E.g. if we have the student ID then we can find out all the student
details. The attribute ‘student ID’ will give us all the values in the
‘student’ table whatever table holds the ‘student ID’ attribute.
 Split the tables and then add the dependencies ….
1 TABLE INTO 3
 The data is split into
3 tables
 We have added an
identifier to the
subject and author
tables
 There needs to be a
PRIMARY KEY in
each table
 Uniquely identifies
each record in the
table.
 Don’t need to add a
PK to the book table
as it has the ISBN
which is unique.
SUBJECT
AUTHOR
Subject ID
Subject
Author ID
Lastname
Forename
1
SQL
1
Garcia-Molina
Hector
2
Database
2
Ullman
Jeffery
3
Computers
3
Viescas
John
4
Hernandex
Michael
BOOK
Title
ISBN
Pages
Publisher
Database Systems: the
complete book
129202447X
1152
Pearson
SQL queries for mere mortals
0321444434
672
Addison Wesley
Database Design for mere
mortals
0321884493
672
Addison Wesley
DEFINING THE RELATIONSHIPS
BookAuthors
ISBN
Author id
129202447X
1
129202447X
2
0321884493
4
0321444434
3
0321444434
4
BookSubject
ISBN
Subject id
129202447X
3
129202447X
2
0321884493
2
0321444434
2
0321444434
3
Book
writes
Author
Book
Author
has
writes
BookAuthors
 An author will have written many books, a book may have many
authors, this is a many to many relationship. This is not ideal and
needs to be replaced with an interlink table
SECOND NORMAL FORM (2NF)
 First normal form deals with redundant data across the
horizontal row
 Second normal form deals with redundancy of data in vertical
columns
 Normal forms are progressive, to get to second the data
should be already in first
Book
Title
ISBN
Pages
Publisher
Database Systems: the
complete book
129202447X
1152
Pearson
SQL queries for mere mortals
0321444434
672
Addison Wesley
Database Design for mere
mortals
0321884493
672
Addison Wesley
The duplicated and split
elements of author and
subject have been removed,
publisher is duplicated and
publisher data should be held
separately.
Remove Publisher and place
in separate table.
SECOND NORMAL FORM
Book
Title
ISBN
Pages
Publisher
Database Systems: the
complete book
129202447X
1152
Pearson
SQL queries for mere mortals
0321444434
672
Addison Wesley
Database Design for mere
mortals
0321884493
672
Addison Wesley
Book
Separate table
allows
additional data
to be held
centrally
Publisher
Title
ISBN
Pages
Publisher
Publisher ID
Publisher
location
Database Systems: the
complete book
129202447X
1152
1
1
Pearson
London
2
0321444434
672
2
Addison
Wesley
New York
SQL queries for mere mortals
Database Design for mere
mortals
0321884493
672
2
 Data per taining to the publisher is extracted and held in a dif ferent table.
 This allows the data to be maintained separately
 If name changes, address moves etc you update the PUBLISHER table rather than every
single record affected in the book table.
SECOND NORMAL FORM
 The relationship between book and publisher is one to many.
 A book only has one publisher
 A publisher may publish many books but it will publish at least 1
 There needs to be a link between the book and the publisher
 Foreign key
 In 2NF you cannot have any data in a table with a composite key
that does not relate to all portions of the composite key
 No obscure data, all data must relate to that table or be part of the link
key.
Book
publishes
Publisher
This notation indicates that a
book has one publisher but a
publisher has many books (and
at least 1)
The ERD also indicates that there
must be a published by one
publisher.
THIRD NORMAL FORM
 3NF requires there are no functional dependencies other than
to data in other tables via the FK
 A table is in 3NF if all of the non -primary key attributes are
mutually independent.
 Link via FK do not hold data that can be sectioned off elsewhere in a
table.