* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download week 3 normalisation
Oracle Database wikipedia , lookup
Ingres (database) wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Concurrency control wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Open Database Connectivity wikipedia , lookup
ContactPoint wikipedia , lookup
Clusterpoint wikipedia , lookup
RELATIONAL DATABASES Normalisation INTRODUCTION Last week we looked at elements of designing a database and the generation of an ERD As part of the design and generation of an ERD there is an iterative cycle of generating ERD, apply normalisation, adjust ERD, check against business rules …. Etc. This week we will look at the process of normalising the data What normalisation is Why it is needed 3NF and beyond DATABASE NORMALISATION Process of restructuring the logical data model of a database The process of removing redundant data from tables Removes repeating data Improves storage efficiency, data integration and scalability Supported by the relational model The level of ef ficiency of a database is measured in normal form (NF) Achieved through a process of applying a series of algorithms / methods Generally involves splitting existing tables into multiple tables and then re-connecting them through joins when a query is needed to pull the data together. THE HISTORY Proposed by Edgar F. Codd in the paper “A relational model of data for large shared data banks” “there is, in fact, a very simple elimination procedure which we shall call normalisation. Through decomposition non-simple domains are replaced by domains whose elements are atomic (non-decomposable) values” Normalising data is now standard in the relational database world. It optimises both data input and retrieval and supports the relational model NOT applied to the warehousing database or those that deviated from the traditional relational model for implementation. Codd established 3 normal forms, other followed but 3NF is considered sufficient for most applications BCNF WHY NORMALISE? Non-normalised databases experience data anomalies May store data representing data in multiple locations, if data is updated in some but not all locations an UPDATE ANOMALY will occur Normalised data stores data in one location and links via a FOREIGN KEY May have inappropriate dependencies. Adding data to this type of database will require first adding unrelated dependency data Normalised data prevents such INSERTION ANOMALIES by ensuring a database relation/record mirrors functional dependencies. May not be able to delete data without having to delete data you don’t want to remove as all data is clumped together DELETION ANOMALIES Normalisation uniquely identifies records through keys and no extraneous information. NORMAL FORMS De-normalised data is simply a list of the data elements in one clump First normal form requires data be identified by a primary key and a number of atomic values / attributes Second normal form and third normal forms deal with the relationship of non-key attributes to the primary key Third normal form is classed as fully normalised and can be ‘tweaked’ to get to BCNF Forth and fifth normal forms deal specifically with the representation of many to many and one to many relationships Sixth normal form only applies to temporal databases. ILLUSTRATION Title Author 1 Author2 ISBN Subject Pages Publisher Database Systems: the complete book Hector GarciaMolina Jeffrey D Ullman 129202447X Databases, Computers 1152 Pearson Database Design for mere mortals Michael J Hernadex 0321884493 Computers, Databases 672 Addison Wesley SQL queries for mere mortals John L Viescas 0321444434 Databases, SQL 672 Addison Wesley Michael J Hernandex This table is not very ef ficient with storage (you need a column/attribute for every author, some books have 4 or 5!) The design does not protect data integrity The table will not scale well FIRST NORMAL FORM All data values should be atomic All column cells should have single values rather than composite values or set of objects / values Title Author 1 Author2 ISBN Subject Pages Publisher Database Systems: the complete book Hector GarciaMolina Jeffrey D Ullman 129202447X Databases, Computers 1152 Pearson Database Design for mere mortals Michael J Hernadex 0321884493 Computers, Databases 672 Addison Wesley SQL queries for mere mortals John L Viescas 0321444434 Databases, SQL 672 Addison Wesley Michael J Hernandex FIRST NORMAL FORM (1NF) The 2 nd author attribute has been removed Duplicate row with different author to ensure data is not lost Duplicate the row for each subject classification Problems: INSERT ANOMALIES – cannot add a new Author without a Book etc. UPDATE ANOMALIES – cannot change 1 publisher for ‘Database design for mere mortals’ we have to change 2 rows DELETE ANOMALIES – if we remove ‘SQL queries for mere mortals’ we have to remove the SQL subject as well Title Author ISBN Subject Pages Publisher Database Systems: the complete book Hector Garcia-Molina 129202447X Databases, 1152 Pearson Database Systems: the complete book Jeffrey D Ullman 129202447X Computers 1152 Pearson SQL queries for mere mortals John L Viescas 0321444434 SQL 672 Addison Wesley SQL queries for mere mortals Michael J Hernandex 0321444434 Databases 672 Addison Wesley Database Design for mere mortals Michael J Hernadex 0321884493 Databases 672 Addison Wesley Database Design for mere mortals Michael J Hernadex 0321884493 Computers 672 Addison Wesley 2 records to split the Author 2 records to split the subject SPLITTING THE TABLE - PROBLEMS Title Author 1 ISBN Subject Pages Publisher Database Systems: the complete book Hector GarciaMolina 129202447X Databases, 1152 Pearson Database Systems: the complete book Jeffrey D Ullman 129202447X Computers 1152 Pearson SQL queries for mere mortals John L Viescas 0321444434 SQL 672 Addison Wesley SQL queries for mere mortals Michael Hernandex 0321444434 Databases 672 Addison Wesley Database Design for mere mortals Michael J Hernadex 0321884493 Databases 672 Addison Wesley Database Design for mere mortals Michael J Hernadex 0321884493 Computers 672 Addison Wesley The table above may be in 1 st NF but it violates 2 nd NF A better solution is to split the data into separate tables Author Subject Book Functional dependencies need to be considered. FUNCTIONAL DEPENDENCIES Redundancy is caused by a functional dependency Functional dependency is a like between 2 sets of attributes (tables/relations) Normalising to 2NF removes undesirable FD’s A set of attributes determining another E.g. if we have the student ID then we can find out all the student details. The attribute ‘student ID’ will give us all the values in the ‘student’ table whatever table holds the ‘student ID’ attribute. Split the tables and then add the dependencies …. 1 TABLE INTO 3 The data is split into 3 tables We have added an identifier to the subject and author tables There needs to be a PRIMARY KEY in each table Uniquely identifies each record in the table. Don’t need to add a PK to the book table as it has the ISBN which is unique. SUBJECT AUTHOR Subject ID Subject Author ID Lastname Forename 1 SQL 1 Garcia-Molina Hector 2 Database 2 Ullman Jeffery 3 Computers 3 Viescas John 4 Hernandex Michael BOOK Title ISBN Pages Publisher Database Systems: the complete book 129202447X 1152 Pearson SQL queries for mere mortals 0321444434 672 Addison Wesley Database Design for mere mortals 0321884493 672 Addison Wesley DEFINING THE RELATIONSHIPS BookAuthors ISBN Author id 129202447X 1 129202447X 2 0321884493 4 0321444434 3 0321444434 4 BookSubject ISBN Subject id 129202447X 3 129202447X 2 0321884493 2 0321444434 2 0321444434 3 Book writes Author Book Author has writes BookAuthors An author will have written many books, a book may have many authors, this is a many to many relationship. This is not ideal and needs to be replaced with an interlink table SECOND NORMAL FORM (2NF) First normal form deals with redundant data across the horizontal row Second normal form deals with redundancy of data in vertical columns Normal forms are progressive, to get to second the data should be already in first Book Title ISBN Pages Publisher Database Systems: the complete book 129202447X 1152 Pearson SQL queries for mere mortals 0321444434 672 Addison Wesley Database Design for mere mortals 0321884493 672 Addison Wesley The duplicated and split elements of author and subject have been removed, publisher is duplicated and publisher data should be held separately. Remove Publisher and place in separate table. SECOND NORMAL FORM Book Title ISBN Pages Publisher Database Systems: the complete book 129202447X 1152 Pearson SQL queries for mere mortals 0321444434 672 Addison Wesley Database Design for mere mortals 0321884493 672 Addison Wesley Book Separate table allows additional data to be held centrally Publisher Title ISBN Pages Publisher Publisher ID Publisher location Database Systems: the complete book 129202447X 1152 1 1 Pearson London 2 0321444434 672 2 Addison Wesley New York SQL queries for mere mortals Database Design for mere mortals 0321884493 672 2 Data per taining to the publisher is extracted and held in a dif ferent table. This allows the data to be maintained separately If name changes, address moves etc you update the PUBLISHER table rather than every single record affected in the book table. SECOND NORMAL FORM The relationship between book and publisher is one to many. A book only has one publisher A publisher may publish many books but it will publish at least 1 There needs to be a link between the book and the publisher Foreign key In 2NF you cannot have any data in a table with a composite key that does not relate to all portions of the composite key No obscure data, all data must relate to that table or be part of the link key. Book publishes Publisher This notation indicates that a book has one publisher but a publisher has many books (and at least 1) The ERD also indicates that there must be a published by one publisher. THIRD NORMAL FORM 3NF requires there are no functional dependencies other than to data in other tables via the FK A table is in 3NF if all of the non -primary key attributes are mutually independent. Link via FK do not hold data that can be sectioned off elsewhere in a table.