Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
GridMiner A Framework for Knowledge Discovery on the Grid – from a Vision to Design and Implementation Peter Brezany, Ivan Janciak, Alexander Wöhrer, A Min Toja University of Vienna Institute for Software Science email: janciak@par.univie.ac.at www.gridminer.org … Intelligent Grid Solutions GridMiner Overview Start: Jan. 2003 Host: University of Vienna Target: Vienna University of Technology provide tools to discover and access relevant knowledge and information from different distributed and heterogeneous data sources Test application area: medical traumatic brain injury treatment Predicting the outcome of seriously ill patients analytical part focuses on data mining and On-Line Analytical Processing (OLAP) www.gridminer.org CGW'04, 13. Dec. 04 2 Project members Project leader: Prof. A Min Tjoa, Vienna University of Technology Prof. Peter Brezany, University of Vienna Visualization: Radoslav Ivanov Data streaming: Nguyen Manh Tho Data mediation: Alexander Wöhrer Knowledge Mgt: Ivan Janciak Job Control: Günter Kickinger Sequence Rules: Michael Rinner Clustering: Markus Mayer Decision rules: Christian Kloner Juergen Hofer GUI: Paul Panhofer Autonomic aspects: Michael Bergmann CGW'04, 13. Dec. 04 OLAP: Bernhard Fiser Umut Onan Ibrahim Elsayed www.gridminer.org 3 Outline Motivation/ Requirements GridMiner Services Architecture Dynamic Service Composition Engine OLAP Knowledge base Data Integration Graphical user interface Implementation Summary www.gridminer.org CGW'04, 13. Dec. 04 4 The process to cover Data distributed over participating hospitals accesses from different platforms (hand held, PC,…) for data generation, querying, analysis Process needs to access various data sources www.gridminer.org CGW'04, 13. Dec. 04 5 GridMiner Motivation integrate knowledge discovery and knowledge management as an autonomic system manage and control whole lifecycle of knowledge give a strong support to other intelligent entities in their needs for knowledge Basic Requirements Ability to access and analyze a huge amount of information – typically heterogeneous and geographically distributed Intelligent behavior ability to maintain, discover, extend, present and communicate knowledge High performance (real-time or soft real-time) query processing High security guarantee www.gridminer.org CGW'04, 13. Dec. 04 6 GridMiner Services Dynamic Workflow Control Service Data mining services Sequences (SPADE) Clustering (SimpleKMeans) Decision rules (SPRINT) OLAP (sequential/parallel version) Association rules on OLAP Grid Data Mediator Service www.gridminer.org CGW'04, 13. Dec. 04 7 User environment Web GridMiner Architecture Graphical User Interface Knowledge Base Service configuration DSCE Client Grid Dynamic service control engine (DSCE) Data Access and Integration Data mining services www.gridminer.org CGW'04, 13. Dec. 04 8 Dynamic Service Control Engine Process a workflow described by DSCL. Based on the Open Grid Services Architecture Supports both interactive and batch processing User independent processing of the workflow Provision of all intermediate results from the involved services Full user control during workflow execution Supports the OGSA Notification Model www.gridminer.org CGW'04, 13. Dec. 04 9 Dynamic Service Control Engine (cont.) www.gridminer.org CGW'04, 13. Dec. 04 10 Knowledge Base SWRL Rules OWL Metadata Domain Ontology Datamining Ont. Activity Ontology Datatsource Ont. Facts Web Ontology Language OWL + OWL-S XML ,XML Schema (XSL) (webrowset,pmml…) www.gridminer.org CGW'04, 13. Dec. 04 11 OLAP Multidimensional data analysis by sequential and distributed / parallel OLAP engines. Cube construction and querying Representation of query results by OLAP Modeling Markup Language Integration with data mining engines (Association rules on OLAP) www.gridminer.org CGW'04, 13. Dec. 04 12 Grid Data Mediation Service Principles Tight Federation: global (relational) schema Virtual integration: let the data where it is always up-to-date data No proprietary solution inherit well solve aspects from OGSA-DAI Not bound to special architecture Supported data sources: RDBMS (via JDBC), XMLDB (Xindice), CSV files Operators: “Union all” and “inner join” Operators are XQuery based (using SAXON) www.gridminer.org CGW'04, 13. Dec. 04 13 Data Integration Scenario Heterogeneities: Name in A is „First Last“ (as the target format) Name in C has to be combined Distribution: 3 data sources www.gridminer.org CGW'04, 13. Dec. 04 14 Data Integration Scenario (cont.) Query: SELECT p_name FROM patient WHERE id=10 Standard to optimized www.gridminer.org CGW'04, 13. Dec. 04 15 Implementation/Technology Globus 3.2 OGSA/DAI GUI – Workflow constructions/Results visualization (JGraph, Java web Start, Java server pages) Service Configuration (Java server pages/PHP/..) Knowledge base – (XML,OWL) www.gridminer.org CGW'04, 13. Dec. 04 16 Data mining Scenario Decision Rules (SPRINT) (Select 10k rows) Decision Rules (C45) Database (100k rows) (Select 20k rows) Decision Rules (C45) www.gridminer.org CGW'04, 13. Dec. 04 17 Graphical User Interface www.gridminer.org CGW'04, 13. Dec. 04 18 Summary Integrated data mining infrastructure Covers the whole process Service Oriented Architecture Implemented Prototype Project ongoing New data mining tasks (algorithms) Knowledge management More information: http://www.gridminer.org www.gridminer.org CGW'04, 13. Dec. 04 19 Thank you Questions? www.gridminer.org CGW'04, 13. Dec. 04 20