* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Chapter06
Entity–attribute–value model wikipedia , lookup
Data Protection Act, 2012 wikipedia , lookup
Versant Object Database wikipedia , lookup
Expense and cost recovery system (ECRS) wikipedia , lookup
Data center wikipedia , lookup
Concurrency control wikipedia , lookup
Data analysis wikipedia , lookup
3D optical data storage wikipedia , lookup
Relational model wikipedia , lookup
Data vault modeling wikipedia , lookup
Clusterpoint wikipedia , lookup
Information privacy law wikipedia , lookup
CHAPTER SIX FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASE AND INFORMATION MANAGEMENT Introduction   Opening Case Review Review traditional file systems Opening Case   Toronto Globe Data in Mainframe not accessible    Data parts copied to Access / Foxbase / Excel Practice creates data islands (Silos) Implemented SAP business warehouse  Single version of truth Opening Case  Data mining with SAP HANA   An in-memory row-ordered database Challenges  Getting people to adopt the DW File Management Terms  Field – A unique data item   Record – Multiple fields connected together   Last name, first name Customer, student, File – Multiple records of the same type  Customer file Traditional File Systems  Fields contain a data item   Fields grouped into records   Customer / student Records group into files   Last name / first name Customer file I disagree with your book. A database does not store the files 1960s Data Management  These are legacy systems    Characterized by traditional file processing Data processing was sequential   Batch processing Not possible to directly locate a particular file record Data dependent on the programs that used the data  Program data dependence 1970s Data Management  Batch processing gives way to on line transaction processing    Technologies    Files stored on disk rather than tape Any record can be located in the same amount of time Indexed Sequential Access Method (ISAM) Virtual Sequential Access Method (VSAM) Direct Access files  Use a hashing function to derive record keys 1980s Data Management   Databases are becoming commonplace Personal computer databases are evolving   DBase R-Base 1990s Data Management     Huge data stores and transaction processing capabilities Distributed databases Object-oriented databases 6 Million+ transactions per second Traditional Files Systems (Problems)  Data redundancy   Data inconsistency    Multiple unsynchronized copies of the same data item in different places Those unsynchronized copies are not the same Which copy is correct? (authoritative) Program / Data dependence Information Granularity  Refers to the level of detail of information   Detailed (POS transaction) Course (Global sales totals) Transactional vs. Analytical Information  Transactional information comes from a business process    A bank deposit A credit card charge Analytical information uses transactional data for the purposes of decision making   Account balance trends Using credit card history to detect fraud Transactional vs. Analytical Information Information Dimensions  Information timeliness    Information quality     Obsolete information is useless Today’s information needs to be provided in real time or near real time Wrong information is useless Redundant information can be the cause of errors Information must be complete Data inconsistency and data integrity Database Management  Characteristics      Complex Databases often spread across multiple servers Databases often spread across multiple physical disks Fault tolerance is critical Databases may be distributed Database Vendors (1)   The industry has consolidated IBM    Oracle Microsoft    DB2 Universal SQL Server Access Sun (MySQL)  Is now Oracle Database Performance      Transaction Processing Performance Council provides standard benchmarks TPC-C – Online transaction processing TPC-E – Online brokerage transactions TPC-H – Ad-hoc decision support TPC-W – Web / E-commerce Database Performance (TPC-C)     Multiple transaction types Independent of software and hardware Scalable Basis is online transaction processing (OLTP) Realities of a DBMS       Data centric rather than application centric Can be a repository for all an organization’s data Databases tend to be centralized Queries get data from a DBMS  SQL is the standard query language Report generators create printed and Webbased reports Applications interface with DBMS Types of Databases  Database models include:     Hierarchical database model – A treebased structure Network database model – Mathematically, a directed graph Relational database model – stores information in the form of logically related two-dimensional tables Object-oriented databases Elements of a Database  Logical view and physical view   Users see and work with the logical view Physical view is controlled by the database management system itself Entities and Attributes  Relational databases store information in tables (entities)   Customer / order / product Tables contain fields (attributes)  Customer name, address Keys  Each table has a primary key that uniquely identifies each record    Natural keys have some meaning (stock symbol) Artificial keys have no intrinsic meaning (your R number) Foreign keys are used to link tables in one-to-many relationships Database Interaction Advantages of an RDMS (Scalability)  Database can scale to the terabyte or petabyte range   NSA maintains 1.9 trillion telephone call records Large databases can span several servers and storage devices Advantages of an RDBMS (Redundancy)  Databases can be configured to write duplicate information   Citibank Journaling and checkpointing are supported Advantages of an RDBMS (Integrity)   Relational integrity constraints are rules that apply to the relationships between tables Business integrity constraints enforce business rules  Not really a part of the DBMS itself Advantages of an RDBMS (Information Security)  A DBMS supports advanced access rights     By By By By table and fields time of day location row information Relational Database (Illustration) Non-relational Databases (Introduction)   Required by scalable applications like Facebook, Google and others Build upon a couple of principles     BASE – basically available, soft-state, eventually consistent (non-relational) ACID – Atomicity, Consistency, Isolation, Durability (relational) They are somewhat new and unproven Usually in-memory Non-relational Databases (Examples)    No-SQL MongoDB Hadoop Hadoop (Characteristics)  Built by Apache but there are several 3-rd party implementations     We break down huge data sets Process them in clusters And put the results back together Characteristics   A distributed file system MapReduce for dividing a task into small parts Hadoop (Illustration) Hadoop (Illustration) Data-driven Web Sites  Nearly all transactional Web sites rely on a database      Amazon Your bank Any shopping cart application Ebay or Craig’s List Facebook and You Tube Database Integration  Databases often need to be integrated    Because of mergers and acquisitions Because of organizational changes We are referring to connections to multiple databases Designing a database (Normalization)  Normalization is the process of factoring data into different tables to   Eliminate data redundancy Support referential integrity Data Warehouses (Introduction)   Central source for clean data May contain internal or external data     Use to spot hidden patterns in data May be integrated with operational database Parts of a data warehouse are called data marts Data warehouses contain an analytical component Cleansing Data  Data is often obtained from a myriad of sources     External lists Internal databases Other databases This data must be cleansed and sanitized to remove  Redundancy / errors / etc… Data Warehouses (Illustration) Multidimensional Analysis   Data are often analyzed as 3dimensional cubes Cubes are then ‘sliced and diced’ to look at various layers Multidimensional Analysis (Illustration) The cost of Perfect Information Database Design (Introduction)  In the systems process, we design before we implement     Requirements specification Conceptual design Logical design Physical Design Database Design Tools  Unified Modeling Language (UML)     Visio Rational Rose Entity relationship diagrams describes relationships between data Normalization eliminates redundant data Database Management HR     Database administrators Data managers Programmers and systems analysts Data security Business Intelligence (Introduction)     Simply put, it’s internal and external data used to support better decision making It’s challenging to sift through the mountains of data It requires cross-functional collaboration between systems More in the next chapter but we use ERP systems to improve business intelligence Business Intelligence (Industries)   BI applies to all industries Retail and sales   Banking   Understanding procurement and distribution (SCM) / customers (CRM) Understand credit worthiness / fraud behavior Insurance  Forecast claim risk and understand at – risk customers Business Intelligence (Industries)  Airlines   Routing planes / minimize turnaround time (Southwest) Marketing    Demographics Sell based on known customer behavior (Harrah’s) Amazon Business Intelligence (Levels)  Operational   Tactical   Short term (Dell ordering supplies) Strategic   Day-to-day operations (building a Dell) Long term organizational goals The systems that provide BI typically do so at all levels BI Levels (Illustration) BI and Latency   From the time of acquisition, how long does it take to analyze (analysis latency) Time to make a decision based on the analysis  E-transactions significantly reduce latency Data Mining (Introduction)    Data gets mined (analyzed) from data contained in a data warehouse or data mart Specialized tools are used to analyze data for ‘interesting nuggets’ Ways to mine   Drill down (general to specific) Drill up (specific to general) Data Mining (Sequences)  Events are linked over time    I buy a house Home Depot knows that They send me a coupon to buy appliances Data Mining (Classification)  We classify items (people for example) into groups and look at the characteristics of that group   Churned customers Customers who have stopped gambling Data Mining (Clustering)    Use to define classification groups Cluster analysis groups data by trait or traits Examples   Don’t drink the water in Fallon Segment customers by zip codes Data Mining (Association)  Answers the question “What traits are associated with other traits”  When I stay at Harrah’s, I gamble  I eat at the Sage room   When I stay in Vegas,  I gamble more Data Mining (Statistical Analysis)  It’s basic statistics    Analysis of variance Correlation coefficients Etc… Text Mining  We need a way to mine unstructured data    All of those Facebook posts All of those Twitter posts Techniques   We mine text We use keywords to mine sentiment BI Benefits  We can understand what’s happening inside and outside a department     Sales knows about product inventory levels and production schedules Production knows about sales and sales forecasts Finance knows about the sales forecasts too This information is provided in near real time Quantifying BI  Some benefits can be clearly quantified       Costs went down Productivity increased Inventory levels were optimized 10% Some are indirectly quantified Some benefits are intangible Sometimes, we get unexpected results Challenges    We need an information policy We need to administer all of this data We need to ensure data quality