* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download 02 Lecture 1
Survey
Document related concepts
Transcript
Lecture 1 BIS4435 – Database Management for Decision Support Dr. Nawaz Khan School of Computing Science E-mail: n.x.khan@mdx.ac.uk Dr. Nawaz Khan, School of Computing Science E-mail: n.x.khan@mdx.ac.uk BIS4435 – Data Management for Decision Support 1 Coursework Discussion Topics Each week you will have a discussion topic (DQ) DT will be available on Fridays. Reply to the DQ by Tuesdays Review and comment on your mates’ responses (at least one). Lecture 1 DQ requirements Initial response to DQ should be 250 words. Review and comments on your mates’ initial response should be 150 words. Appropriate references at the end of the text and proper citation of references through out the text (avoid quotes). Harvard style ref. Print your initial post, response to your mates’ initial response and any comment on your post that is made by your mates. Preserve it in order to attach it with your CW portfolio. Dr. Nawaz Khan, School of Computing Science E-mail: n.x.khan@mdx.ac.uk BIS4435 - Online Database Systems 2 More on Coursework: Self Reflection Make sure you have one/one and half pages self reflection on both the coursework. It should contain: Lecture 1 What you have achieved so far What you find the most difficult thing What you find the easiest thing What challenges you face to do the coursework How have you overcome the challenges What motivated you What are the new concepts you come across and how it helps to achieve the learning outcomes What you could have done better Dr. Nawaz Khan, School of Computing Science E-mail: n.x.khan@mdx.ac.uk 3 Introduction Copy coursework from someone ?? Lecture 1 See University Plagiarism policy in students’ handbook Don’t forget to put references (follow specific guidelines for referencing) Acknowledge properly, since collusions are not accepted Make sure you have cited your references Examination: followed by the University regulations Contents of the module: 3 main parts Theoretical aspects of relational database Large Database: Data Warehouse and Data Mining Decision Support System General diagram of the module: see next slide Dr. Nawaz Khan, School of Computing Science E-mail: n.x.khan@mdx.ac.uk 4 Introduction DB Back-End Lecture 1 Theoretical Aspects of Relational DB Example Relational DB: Oracle Decision support System Uncertainty, probability and Linear model, feed forward network, self-organising map, Expert system and knowledge engineering Large Data Management Data Strategy and Data Warehousing Intelligent Database and Data Mining Fuzzy logic, genetic algorithm and hybrid intelligent Dr. Nawaz Khan, School of Computing Science E-mail: n.x.khan@mdx.ac.uk 5 Reading Materials Lecture 1 Connolly, T.M., and Begg, C.E., Database Systems: A Practical Approach to Design, Implementation and Management, Addison-Wesley, 4th Edition, ISBN: 0321210255 Belavkin, R., Blundell, B., Cairns, P., Huyck, C., Mitchell, I., Stockman, T. (2005). Management Support Systems, Middlesex University Press, ISBN: 1-89825368-4 Module Learning Units at OasisPlus Dr. Nawaz Khan, School of Computing Science E-mail: n.x.khan@mdx.ac.uk 6 Introduction File-based approach Shared file approach Database approach: 3-schema architecture Lecture 1 External schema Conceptual schema Internal schema DBMS components Characteristics of the database approach Brief history of database systems and applications Extending database capabilities for new applications Reading suggestion: Connolly, T.M., and Begg, C.E., Database Systems: A Practical Approach to Design, Implementation and Management, Addison-Wesley, 4th Edition, ISBN: 0321210255(chapters 1) Global campus materials on OASIS: http://oasis.mdx.ac.uk/ (unit 1) Dr. Nawaz Khan, School of Computing Science E-mail: n.x.khan@mdx.ac.uk 7 Introduction File-based approach Lecture 1 Data is stored in one or more separate computer files Data is then processed by computer programs applications Problems: Data redundancy Data inconsistency Dr. Nawaz Khan, School of Computing Science E-mail: n.x.khan@mdx.ac.uk 8 Introduction Files Customer File Applications Stock File Customer Orders Order File Customer File Lecture 1 Stock File Customer Invoicing Files Applications Customer File Customer Orders Stock File Customer Invoicing Order File Order File Purchase Orders Supplier File Stock File Purchase Orders Supplier File Stock Control Stock File Stock Control Order File File-based approach Dr. Nawaz Khan, School of Computing Science E-mail: n.x.khan@mdx.ac.uk Shared file approach 9 Introduction Shared file approach Lecture 1 Data (files) is shared between different applications Data redundancy problem is alleviated Data inconsistency problem across different versions of the same file is solved Other problems: Rigid data structure: If applications have to share files, the file structure that suits one application might not suit another Physical data dependency: If the structure of the data file needs to be changed in some way, this alteration will need to be reflected in all application programs that use that data file No support of concurrency control: While a data file is being processed by one application, the file will not be available for other applications or for ad hoc queries Dr. Nawaz Khan, School of Computing Science E-mail: n.x.khan@mdx.ac.uk 10 Introduction Database approach Lecture 1 DataBase Management System (DBMS): a generalpurpose software system that facilitates the processes of defining, constructing, manipulating, and sharing databases among various users and applications. Database: a collection of related data managed by a DBMS Data: known facts that can be recorded and that have implicit meaning Database system = the database + DBMS software DBMS provides facilities for querying, data security, and integrity and concurrent control Database application - set of programs that use DBMS to perform a particular business function Dr. Nawaz Khan, School of Computing Science E-mail: n.x.khan@mdx.ac.uk 11 Introduction Database approach Lecture 1 Advantage - physical and logical data independence achieved by hierarchy of levels of data specification External Schema 1 External Schema N External Schema 2 Interface between conceptual schema and external schemas Conceptual Schema Interface between conceptual schema and internal schema Internal Schema Database physically stored in files on disks Dr. Nawaz Khan, School of Computing Science E-mail: n.x.khan@mdx.ac.uk 12 Introduction Database approach External schema: Lecture 1 Describes database as it is seen by user and applications reflects a simplified model of the world Allows applications to see as much of data as they require, while excluding unrelated data items Interfaces with conceptual schema May be modified or created without altering physical storage of data, modification reflected in the interface Dr. Nawaz Khan, School of Computing Science E-mail: n.x.khan@mdx.ac.uk 13 Introduction Database approach Conceptual schema: Lecture 1 Describes the universe of interest to the users of the database system - data required Concerned with data rather than storage or access, concentrates on describing entities, data types, relationships, user operations, and constraints Interfaces with external and internal schema Logical data independence is the capability to change the conceptual schema without having to change external schemas or application programs Dr. Nawaz Khan, School of Computing Science E-mail: n.x.khan@mdx.ac.uk 14 Introduction Database approach Internal schema: Lecture 1 Definition of the way in which data is physically stored Interface with conceptual schema identifies how an item in the conceptual schema is stored and accessed Physical data independence is the capability to change the internal schema without having to change the conceptual schema. Hence, the external schemas need not to be changed as well Dr. Nawaz Khan, School of Computing Science E-mail: n.x.khan@mdx.ac.uk 15 Introduction File-based approach Shared file approach Database approach: 3-schema architecture Lecture 1 External schema Conceptual schema Internal schema DBMS components Characteristics of the database approach Brief history of database systems and applications Extending database capabilities for new applications Dr. Nawaz Khan, School of Computing Science E-mail: n.x.khan@mdx.ac.uk 16 Introduction DBMS components Lecture 1 DBMS engine - central component User interface - languages & interfaces Data dictionary - data structure, user & access right, rules Performance management - query optimisation & DBMS reorganisation Data integrity - intra-record, referential integrity & concurrency Backup and recovery - log Application development - CASE tool Security management - protect and control access to database & data dictionary See the reading suggestion for more details Dr. Nawaz Khan, School of Computing Science E-mail: n.x.khan@mdx.ac.uk 17 Introduction Characteristics of the database approach Self-describing nature of a database systems Insulation between programs and data, and data abstraction Lecture 1 Program-data independence + Program-operation independence = Data abstraction A data model is a type of data abstraction Support of multiple views of the data Sharing of data and multi-user transaction processing Other advantages of using the DBMS approach Controlling redundancy Restricting unauthorized access Providing persistent storage for program objects Providing storage structures for efficient query processing Providing backup and recovery Dr. Nawaz Khan, School of Computing Science E-mail: n.x.khan@mdx.ac.uk 18 Introduction Characteristics of the database approach Other advantages of using the DBMS approach (cont.) Lecture 1 Providing multiple user interfaces Representing complex relationships among data Enforcing integrity constraints Permitting inference and actions using rules: deductive database systems Dr. Nawaz Khan, School of Computing Science E-mail: n.x.khan@mdx.ac.uk 19 Introduction Extending database capabilities for new applications Lecture 1 Example applications: storage and retrieval of images, videos, data mining (large amounts of data need to be stored and analyzed), spatial databases, time series applications, … More complex data structures than relational representation New data types except for the basic numeric and character string types New operations and query languages for new data types New storage and retrieval methods Dr. Nawaz Khan, School of Computing Science E-mail: n.x.khan@mdx.ac.uk 20 Distributed Database Lecture 1 A distributed database management systems employ a number of computer workstations at different sites. These workstations are part of a local network system. The workstations contain a set of hardware and software that allow them to be an integral part of this network and the DDBMS must rely on these network components for its data exchange. The workstations are needed to be attached to each other through a communication media that allow the sites to interact and to carry data Dr. Nawaz Khan, School of Computing Science E-mail: n.x.khan@mdx.ac.uk 21 Introduction Summary: Lecture 1 Introduction to the module, BIS4435File-based approach Shared file approach Database approach DBMS components Characteristics of the database approach Brief history of database systems and applications Distributed Database Reading suggestion: do not forget !! Next week: Relational Data Model Reading suggestion: [1]: Chapter 1 (The relational data model and relational database constraints) [3]: Unit 1: Global Campus Unit Dr. Nawaz Khan, School of Computing Science E-mail: n.x.khan@mdx.ac.uk 22 Lecture 1 BIS4435 – Data Management for Decision Support Lecture 2: Relational Data Model Dr. Nawaz Khan School of Computing Science E-mail: n.x.khan@mdx.ac.uk Dr. Nawaz Khan, School of Computing Science E-mail: n.x.khan@mdx.ac.uk BIS4435 23 Introduction Unit 1: Introduction to the Module: Introduction to the Database Unit 2: Fundamentals of Relational and Object Model: Reading Suggestion: Lecture 1 Unit 1: Connolly, T.M., and Begg, C.E., Database Systems: A Practical Approach to Design, Implementation and Management, Addison-Wesley, 4th Edition, ISBN: 0321210255(chapters 1) Global campus materials on OASIS: http://oasis.mdx.ac.uk/ (unit 1) Unit 2: Connolly, T.M., and Begg, C.E., Database Systems: A Practical Approach to Design, Implementation and Management, Addison-Wesley, 4th Edition, ISBN: 0321210255(chapters 3) Dr. Nawaz Khan, School of Computing Science E-mail: n.x.khan@mdx.ac.uk BIS4435 24 Relational Data Model Outline Lecture 1 Basic Concepts: relational data model, relation schema, domain, tuple, cardinality & degree, database schema, etc. Relational Data Model Constraints key, primary key & foreign key entity integrity constraint referential integrity Update Operations on Relations insert deletion modification Summary Q&A Dr. Nawaz Khan, School of Computing Science E-mail: n.x.khan@mdx.ac.uk 25 Relational Data Model Basic concepts Lecture 1 Relational data model: represents a database in the form of relations - 2-dimensional table with rows and columns of data. A database may contain one or more such tables. A relation schema is used to describe a relation Relation schema: R(A1, A2,…, An) is made up of a relation name R and a list of attributes A1, A2, . . ., An. Each attribute Ai is the name of a role played by some domain D in the relation schema R. R is called the name of this relation. The degree of a relation is the number of attributes n of its relation schema. Domain D: D is called the domain of Ai and is denoted by dom(Ai). It is a set of atomic values and a set of integrity constraints STUDENT(Name, SSN, HomePhone, Address, OfficePhone, Age, GPA) Degree = ?? dom(GPA) = ?? Dr. Nawaz Khan, School of Computing Science E-mail: n.x.khan@mdx.ac.uk 26 Relational Data Model Basic concepts Lecture 1 Tuple: row in table, record Cardinality: number of tuples (rows) Database schema S = {R1, R2,…, Rm} A relation (or relation state, relation instance) r of the relation schema R(A1, A2, . . ., An), also denoted by r(R), is a set of ntuples r = {t1, t2, . . ., tm}. Each n-tuple t is an ordered list of n values t = <v1, v2, . . ., vn>, where each value vi, i=1..n, is an element of dom(Ai) or is a special null value. The ith value in tuple t, which corresponds to the attribute Ai, is referred to as t[Ai] Relational data model Database schema Relation schema Relation Tuple Attribute Dr. Nawaz Khan, School of Computing Science E-mail: n.x.khan@mdx.ac.uk 27 Relational Data Model Lecture 1 A relation can be conveniently represented by a table, as the above example shows The columns of the tabular relation represent attributes Each attribute has a distinct name, and is always referenced by that name, never by its position Each row of the table represents a tuple. The ordering of the tuples is immaterial and all tuples must be distinct Dr. Nawaz Khan, School of Computing Science E-mail: n.x.khan@mdx.ac.uk 28 Relational Data Model Outline Basic Concepts: relational data model, relation schema, domain, tuple, cardinality & degree, database schema, etc. Relational Data Model Constraints Lecture 1 key, primary key & foreign key entity integrity constraint referential integrity Update Operations on Relations insert deletion modification Summary Q&A Dr. Nawaz Khan, School of Computing Science E-mail: n.x.khan@mdx.ac.uk 29 Relational Data Model Relational Data Model Constraints Key, primary key & foreign key Key: A key K of a relation R is a subset of the attributes of R which has the following time-independent properties: Lecture 1 – Unique identification: The value of K must uniquely identify each tuple in R – Non-redundancy: No attribute in K can be discarded without destroying property 1 Primary Key: There may be more than one set of attributes which satisfy both properties. Each of them is called a candidate key. In this case only one candidate key must be chosen as the key called primary key. Primary key (PK) cannot contain null value Dr. Nawaz Khan, School of Computing Science E-mail: n.x.khan@mdx.ac.uk 30 Relational Data Model Relational Data Model Constraints Key, primary key & foreign key Single primary key: Lecture 1 TASK (TASK_NAME, START_DATE,EXPECTED_COMP_DATE, COMP_DATE, EMPNO) Combined primary key: If we want to store in the task table, the details of tasks for a number of different projects, TASK_NAME is no longer a unique identifier. The solution is to use a further attribute to provide a unique identifier for each task. TASK (PROJECT_NAME, TASK_NAME, START_DATE, EXPECTED_COMP_DATE, COMP_DATE, EMPNO) Dr. Nawaz Khan, School of Computing Science E-mail: n.x.khan@mdx.ac.uk 31 Relational Data Model Example: A relational database schema that we call COMPANY = {EMPLOYEE, DEPARTMENT, DEPT_LOCATIONS, PROJECT, WORKS_ON, DEPENDENT} Lecture 1 Dr. Nawaz Khan, School of Computing Science E-mail: n.x.khan@mdx.ac.uk 32 Relational Data Model Relational Data Model Constraints Key, primary key & foreign key Lecture 1 Foreign Key (FK): A set of attributes is a FK of R1 if and only if it is the primary key of another relation schema R2 FK represents the relationship between R1 and R2 FK can contain null value, but we may avoid this by replacing with a flag Example: TASKS (TASK_NAME, START_DATE, EXP_COMP_DATE, COMP_DATE, EMPNO, PROJECT_NAME*) PROJECT (PROJECT_NAME, START_DATE, EXP_COMP_DATE, COMP_DATE, PROJECT_LEADER) To form the link between the two tables, we place the primary key of the PROJECT table into the TASK table. Through the use of PROJECT_NAME as a FK, we can see for any given task, the project to which it belongs Dr. Nawaz Khan, School of Computing Science E-mail: n.x.khan@mdx.ac.uk 33 Relational Data Model Relational Data Model Constraints Lecture 1 Entity Integrity Constraint: all entities in a database must be identified by the primary key. That is why NO primary key value can be null Referential Integrity Constraint: The value of a foreign key must be meaningful, which means that a tuple in one relation that refers to another relation must refer to an existing tuple in that relation Example: JobList and Company tables (see notes for labs 02, 03) JobList JobID CompanyID JobTitle Salary …. Company CompanyID CompanyName Address ... Dr. Nawaz Khan, School of Computing Science E-mail: n.x.khan@mdx.ac.uk 34 Relational Data Model Outline Basic Concepts: relational data model, relation schema, domain, tuple, cardinality & degree, database schema, etc. Relational Data Model Constraints Lecture 1 key, primary key & foreign key entity integrity constraint referential integrity Update Operations on Relations insert deletion modification Summary Q&A Dr. Nawaz Khan, School of Computing Science E-mail: n.x.khan@mdx.ac.uk 35 Relational Data Model Update Operations on Relations Insertion: to insert a new tuple t into a relation R. When inserting a new tuple, it should make sure that the database constraints are not violated: Lecture 1 The value of an attribute should be of the correct data type (i.e. from the appropriate domain). The value of a prime attribute (i.e. the key attribute) must not be null The key value(s) must not be the same as that of an existing tuple in the same relation The value of a foreign key (if any) must refer to an existing tuple in the corresponding relation (labs !!) Two options if the constraints are violated: Reject the operation Rectify the operation Dr. Nawaz Khan, School of Computing Science E-mail: n.x.khan@mdx.ac.uk 36 Relational Data Model Update Operations on Relations Deletion: to remove an existing tuple t from a relation R. When deleting a tuple, the following constraints must not be violated: Lecture 1 The tuple must already exist in the database The referential integrity constraint is not violated Four options if the constraints are violated: Reject the operation Rectify the operation to find the existing tuple concerned Also remove the tuples that reference the tuple being deleted Modify the referencing attribute values (i.e. values of foreign keys) that cause the violation Modification: to change values of some attributes of an existing tuple t in a relation R Test all above operations and constraints during the labs Dr. Nawaz Khan, School of Computing Science E-mail: n.x.khan@mdx.ac.uk 37 ACTIVITY Lecture 1 Dr. Nawaz Khan, School of Computing Science E-mail: n.x.khan@mdx.ac.uk BIS4229 – Industrial Data Management Technology 38 Introduction Summary: Relational Data Model Lecture 1 Basic concepts Constraints Update operations on relations Reading suggestion: do not forget !! (other relational operations) Next week: ER Modeling: Reading suggestion: – [1]: Chapter 3, 7 – [3]: Unit 2 Dr. Nawaz Khan, School of Computing Science E-mail: n.x.khan@mdx.ac.uk 39 Relational Data Model Outline Basic Concepts: relational data model, relation schema, domain, tuple, cardinality & degree, database schema, etc. Relational Data Model Constraints Lecture 1 key, primary key & foreign key entity integrity constraint referential integrity Update Operations on Relations insert deletion modification Summary Q&A Dr. Nawaz Khan, School of Computing Science E-mail: n.x.khan@mdx.ac.uk 40