Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Structural and Semantic Heterogeneity in Database Schema Integration SIXTH Conference of Department of Computing Wednesday 4 May 2005 David George Presentation Content Why is Integration necessary? Evolution of Integration Approaches Barriers to Integration - Structural and Semantic Heterogeneity New opportunity - Ontology and the Semantic Web Why Database Integration? Drivers for Data Integration Global organisations with distributed data. Organisations having legacy and new databases. Organisational change e.g. business re-engineering and acquisitions. Autonomous departments with disconnected systems requiring interoperability e.g. Financial Services. Business Intelligence requiring: decision-support systems. customer analysis and marketing strategies. data mining Schema Integration Local DB schema Global schema integration Queries Global Schema Schema 1 Query Schema 2 Schema n input Global:Local Schema Mapping Query output Evolution in Integration Approaches Knowledge Evolution in Integrations Global Domain Agreements Digital media Visual/Spatial/Temporal Data [Kiosk/Geographic/Flights/Forecasting] Focus – Semantics Domain-specific Information Structured, Semi-structured Text repositories Focus - Syntax of data type, format & Schema constructs Data Structured DBs, Files System Local Task Schemas Focus – Systems & Communications Schema Integration Common Data Models Federated DBS Virtual Integration Single Ontologies Federated IS (inc Mediators) 1985 Multiple ontologies, Inter-ontological Information Brokering 1995 Federated DBMS approach External Schema 1.1 Common Data Model External Schema 1.2 External Schema 2.1 Federated Schema 1 Export Schema 1.1 Federated Schema 2 Export Schema 2.1 Export Schema 2.2 Component Schema 1 Component Schema 2 Local Schema 1 Local Schema 2 Component DBS 1 Component DBS 2 etc Application: Integration of business databases FDBMS schema architecture External Schema 1.1 Common Data Model External Schema 1.2 External Schema 2.1 Federated Schema 1 Export Schema 1.1 External Schema 2.2 Federated Schema 2 Export Schema 2.1 Export Schema 2.2 Export Schema 3.1 Component Schema 1 Component Schema 2 Component Schema 3 Local Schema 1 Local Schema 2 Local Schema 3 Component DBS 1 Component DBS 2 Component DBS 3 Mediator/Wrapper (Virtual integration) Network Internet Local Schema Local Schema Web Sourc e O/RDB Data Sources Wrapper Wrapper Mediated Schema Query1 Query Translation Query 2 Mediator Integration System User Query Application: Integrated access to Heterogeneous data Information Brokering Search Query: “Find detached houses for sale under £300k with 2 bathrooms, 3 bedrooms, a local school rated in the upper quartile of govt. league tables, in a district with below-average crime rate and a socio-economically diverse population?” Multiple Worlds Information Mediation Property Sales Crime Statistics School Rankings Demographics Barriers to Integration - Structural & Semantic Heterogeneity Recipe for Heterogeneity and Conflict Conceptualisations of the real world are influenced by the designers view of the Concept and Context to be modelled Conceptualisation by Real World Denotation of Representation by Conceptual World Database World (representation) Interpretation of Schema Type Conflicts Name Publisher Address Title Pub-Book Book Name Book-Topic Title Code Topics Title Publication Pub-Keyword Keyword Code Research Area Publisher Taxonomy of Schema Conflicts Entity Definition Conflicts Naming conflicts (Synonyms and Homonyms) DB Identifier conflicts e.g. ID# vs. Name Schema isomorphism at attribute level (e.g. mapping of telephone. vs. HomeTel + WorkTel) Missing Attributes Domain Definition Conflicts Naming conflicts Data Representation (Integer vs. String) Data Dimensions (volume, weight, price, number) Dimension Measures (based on above) Data Scaling ( £K, £M) Data Precision (1-100 vs. A-E) Data Value Conflicts Attribute Integrity Constraints (cardinality, uniqueness, nulls) Known Inconsistency (has errors, presence/absence) Temporal Inconsistency (last update) Acceptable Inconsistency (within a range) Incoherence in Cardinality Invoice 1 Inv:Order Invoice 1 Invoice n Inv:Order Inv:Order 1 m m Order Order Order Abstraction and Schematic Conflicts Abstraction Level Conflicts Schematic Discrepancies Generalisation/Specialisation Data Value to Attribute Aggregation/ Decomposition Attribute to Entity Data Value to Entity Generalisation/Specialisation Conflicts Schema 1 Schema 2 Student Student S_Type (ID#,Name,Type,Course) U-graduate (ID#,Name,Course) Graduate (ID#,Name,Course) i.e. U-graduate in schema 2 represented at more general level in schema 1 Specialisation Classification Conflicts Employee Gender Criteria inconsistency Role Person <30 30-60 Adult Sex >60 <25 25-55 >55 Customer Characterisation inconsistency Senior Service Person Degrees inconsistency Customer Child Employee Teen Parent G-Parent Aggregation Conflicts Aggregation used in schema 1 is represented by a set-of entities in schema 2 Also NB: mapping exists in only one direction Schema 1 Convoy Schema 2 Ship (ID#, Av_Weight, Location) (ID#, Weight, Location, Captain) Aggregation Conflicts (contd) Component class of collection Employee(department) vs. Employee(division(department)) Aggregation Specialisation CarType(carMake, carDesign) vs. FamilyType(carMake, saloonSize) Aggregation Composition Person(address, tel) vs. Person(street, city, county, tel) Schematic Discrepancies Data:Attribute:Entity conflicts Stock DB1 (Date, StockCode, ClosePrice) Value (stockItem) Stock DB2 (Date, StockItem1, StockItem2, …StockItemn) (ClosePrice) Attribute StockItem1 DB3 (Date, ClosePrice) StockItemn DB3 (Date, ClosePrice) ..….. Entity Entity So where next? Global Ontology & knowledge Domain Enterprise Application Schema Local data database Information Brokering New Solutions - Ontologies and the Semantic Web Ontologies in Computing Formal vocabulary of a “universe of discourse”. Ontologies define: concepts and their attributes relationships between concepts constraints on those relationships “An Ontology is a formal, explicit specification of a shared conceptualization” (Gruber, 1993 & Borst, 1997) Bibliographic Data Ontology (extract) Biblio-Thing Agent Document Person Author Organization Book Miscellaneous-Publication Publisher University Proceedings Edited-Book Thesis Periodical-Publication Cartographic-Map Doctoral-Thesis Journal Technical-Manual Computer-Program Newspaper Magazine Master-Thesis http://www.ksl.stanford.edu/knowledge-sharing/ontologies/html/ Types of “ontologies” • DE BRUIJN, J. (2003) Using Ontologies - Enabling Knowledge Sharing and Reuse on the Semantic Web [online]. Innsbruck, Austria, DERI – Digital Enterprise Research Institute. Available from: http://www.deri.ie/publications/techpapers/documents/DERI-TR-2003-10-29.pdf. [Accessed 15 February 2005]. • • • Value restrictions: values of properties are restricted (e.g. by a datatype). General logic constraints: values may be constrained by using values from other properties. First-order logic constraints: very expressive constraints between relationships such as: disjoint classes, inverse relationships, part-whole relationships. Semantic Web DE BRUIJN, J. (2003) Using Ontologies - Enabling Knowledge Sharing and Reuse on the Semantic Web [online]. Innsbruck, Austria, DERI – Digital Enterprise Research Institute. Available from: http://www.deri.ie/publications/techpapers/documents/DERI-TR-2003-10-29.pdf. [Accessed 15 February 2005]. Semantic Web Tower OWL: Clients in S1same-As Customers in S2 OWL Ontology Language RDFS: person X is a LivingPerson RDF: person X is named “Bill". “The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation” Tim Berners-Lee et al., 2001 RDF Example Object, Attribute, Value Triple: Predicate, subject, object End of Presentation Semantic Data Model Knowledge I n t e r o p e r a b i l i t y Information Data Evolution in Interoperability Understanding comprehensive metadata and ontology approaches Digital media Visual/Spacio-Temporal Modelling Scientific/Engineering Key focus on: Semantics & more domain-specific Structured, Semi-structured (HTML etc) Text repositories Global Domain Multi-modal sys Understanding use of metadata & schematic heterogeneities Key focus on: Syntax – data types/format Structure – schema constructs O-O sys Structured DBs, Files System Key focus: Systems & Comms. Local Schema E-R sys Common Data Models Schema translation & Integration MDBMS / Federated DBS Schematic & metadata relationships, Wrappers, Single Ontologies Multiple ontologies, Inter-ontological, Metadata standards Fed. Inf. Systems / Mediators Mediator / Information Brokering 1985 1995 Architectures