Download Data mart - KBU ComSci by : Somchai

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Entity–attribute–value model wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Big data wikipedia , lookup

Clusterpoint wikipedia , lookup

Functional Database Model wikipedia , lookup

Database model wikipedia , lookup

Transcript
The Need for Data Analysis
 Managers track daily transactions to evaluate how the
business is performing
 Strategies should be developed to meet
organizational goals using operational databases
 Data analysis provides information about short-term
tactical evaluations and strategies
2
Business Intelligence
 Comprehensive, cohesive, integrated tools and
processes
 Capture, collect, integrate, store, and analyze data
 Generate information to support business decision
making
 Framework that allows a business to transform:
 Data into information
 Information into knowledge
 Knowledge into wisdom
3
4
Decision Support Data
 BI effectiveness depends on quality of data gathered
at operational level
 Operational data seldom well-suited for decision
support tasks
 Need reformat data in order to be useful for business
intelligence
5
Operational Data vs.
Decision Support Data
 Operational data
 Mostly stored in relational database
 Optimized to support transactions representing daily
operations
 Decision support data differs from operational data in
three main areas:
 Time span
 Granularity
 Dimensionality
6
Decision Support
Database Requirements
 Specialized DBMS tailored to provide fast answers to
complex queries
 Four main requirements:
 Database schema
 Data extraction and loading
 End-user analytical interface
 Database size
7
Decision Support
Database Requirements (cont’d.)
 Database schema
 Complex data representations
 Aggregated and summarized data
 Queries extract multidimensional time slices
 Data extraction and filtering
 Supports different data sources
 Flat files
 Hierarchical, network, and relational databases
 Multiple vendors
 Checking for inconsistent data
8
Decision Support
Database Requirements (cont’d.)
 End-user analytical interface
 One of most critical DSS DBMS components
 Permits user to navigate through data to simplify and
accelerate decision-making process
 Database size
 In 2005, Wal-Mart had 260 terabytes of data in its data
warehouses
 DBMS must support very large databases
(VLDBs)
9
The Data Warehouse
 Subject-oriented, integrated, time-variant, and
nonvolatile collection of data
 Provides support for decision making
 Usually a read-only database optimized for data
analysis and query processing
 Requires time, money, and considerable managerial
effort to create
10
Data Warehouse is subject oriented.
11
Data Warehouse is Integrated.
 All data from multiple sources is required to be converted in a
standard format to populate a data warehouse.
 Here are some of the items that would need standardization:
 Naming conventions
 Codes
 Data attributes
 Measurements
12
Data Warehouse is integrated
13
Data Warehouse is integrated.
14
15
Data Warehouse is time-variant
 In order to discover trends in business, analysts need large
amounts of data. Historical data is kept in a data warehouse.
 For example, one can retrieve data from 3 months, 6 months,
12 months, or even older data from a data warehouse. This
contrasts with a transactions system, where often only the
most recent data is kept.
 For example, a transaction system may hold the most recent
address of a customer, where a data warehouse can hold all
addresses associated with a customer.
16
Data Warehouse is nonvolatile collection
of data
 Data were stored in Data Warehouse for read only , small and
slowly changed.
 Once data is in the data warehouse, it will not change. So,
historical data in a data warehouse should never be altered.
17
Data Granularity in a Data Warehouse
 Data is stored at different detail levels in operational
systems as well as in a data warehouse.
 Data is not normally stored in summarized form in
an operational system.
 Data in a warehouse is required to be stored in
summarized form in a warehouse.
18
Data Granularity in a Data
Warehouse(cont’d)
 Data is stored at different detail levels in operational
systems as well as in a data warehouse.
 Data is not normally stored in summarized form in an
operational system.
 Data in a warehouse is required to be stored in
summarized form in a warehouse.
 The more detail there is in the fact table, the
higher its granularity and vice versa.
19
Data Granularity in a Data
Warehouse(cont’d)
 Example: Say we have a data mart with a single fact
(Sales) and three dimensions (Time, Organization and
Product). The fact table contains three metrics (Unit
Price, Units Sold and Total Sale Amount). The Time
dimension consists of four hierarchical elements (Year,
Quarter, Month and Day). The Organization dimension
consists of three hierarchical elements (Region, District
and Store). The Product dimension consists of two
hierarchical elements (Product Family and SKU).
20
Data Granularity in a Data
Warehouse(cont’d)
 As always, the metrics in the Sales fact table must be stored
at some intersection of the dimensions (i.e., Time,
Organization and Product). Hence, in this data mart, the
highest granularity that we can store Sales metrics is by
Day/Store/SKU (i.e., the lowest level in each dimensional
hierarchy). Conversely, the lowest granularity that we can
aggregate Sales metrics to in this data mart is by
Year/Region/Product Family (i.e., the highest level in each
dimensional hierarchy). We may also (for a variety of
performance reasons) choose to store Sales metrics at some
intermediate level of granularity (e.g., by
Month/District/SKU).
21
Granularity levels in a data warehouse
22
The Data Warehouse (cont’d.)
 Data mart
 Small, single-subject data warehouse subset
 More manageable data set than data warehouse
 Provides decision support to small group of people
 Typically lower cost and lower implementation
time than data warehouse
23
Data Marts
24
Components of Dataware House
25
Data Sources
 Source data coming to warehouse.
 Can be divided into four categories:
 Production Data: Data coming from operational databases.
 Internal Data: Data held in private files of employees and
departments (not in operational database).
 Archived Data: Data available in backups of operational
databases.
 External Data: Data not stored at organization end but
coming from some external sources but that data is useful
to organization.
26
Example of Production Data
 Data related to doctors, patients, treatments in a hospital
system.
 This system will be an operational database or an online
transaction processing system.
 Users will enter information in this system on regular basis.
 Data coming from this information system to data warehouse
is called production data.
27
Example of Internal Data
 In a hospital, there may be some data which is not stored in
operational database but in some excel sheets and word files.
 Manual registration slips of patient, when operational database was
not active.
 Some standard operating procedures (SOP) documents which
cannot be stored in operational system.
 Some notes taken by doctor about his patients in some word
document.
 List of some patients who visited doctor for some consultancy, but
were not registered patients of hospital.
28
Example of Archived Data
 Backups of databases are maintained on regular basis.
 When amount of data stored in an operational database
increases, it is stored in backup files.
 Backup files are normally stored on some off-line storage like
a Magnetic Tape.
 For example: backup of a hospital’s database is maintained on
regular basis.
 This archived data is useful for a data warehouse to provide
historical information about data.
29
Example of External Data
 A car rental company have a system to store data
about the vehicles they provide for rent.
 Company need to maintain information from
different manufacturers about new models of cars.
 This information will be external to that car rental
company, not part of their system.
30