Download The Survey of Data Mining Applications

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Asian Journal Of Computer Science And Information Technology 2: 4 (2012) 68– 77.
Contents lists available at www.innovativejournal.in
Asian Journal of Computer Science and Information Technology
Journal homepage: http://www.innovativejournal.in/index.php/ajcsit
THE SURVEY OF DATA MINING APPLICATIONS AND FEATURE SCOPE
Pragnyaban Mishra ,Neelamadhab Padhy*, Rasmita Panigrahi
Gandhi Institute of Engineering and Technology,GIET,Gunupur, India
ARTICLE INFO
ABSTRACT
Corresponding Author:
Neelamadhab Padhy
Asst. Professor, Gandhi Institute of
Engineering and
Technology,GIET,Gunupur ,
neela.mbamtech@gmail.com
Today, multinational companies and large organizations have operations in
many places in the world. Each place of operation may generate large volumes
of data. Corporate decision makers require access from all such sources and
take strategic decisions. The information and communication technologies
have highly used in the industry .The data warehouse is used in the significant
business value by improving the effectiveness of managerial decision-making.
In an uncertain and highly competitive business environment, the value of
strategic information systems such as these are easily recognized however in
today’s business environment, efficiency or speed is not the only key for
competitiveness. Such tremendous amount of data, in the order of tera- to
peta-bytes, has fundamentally changed science and engineering, transforming
many disciplines from data-poor to increasingly data-rich, and calling for new,
data-intensive methods to conduct research in science and engineering. To
analyze this vast amount of data and drawing fruitful conclusions and
inferences it needs the special tools called data mining tools. This paper gives
overview of the data mining systems and some of its applications in the different
fields.
KeyWords: Data mining
applications, data ware house
system, Data mining system
architecture, data mining life
cycle.
INTRODUCTION
The advent of information technology in various
fields of human life has lead to the large volumes of data
storage in various formats like records, documents,
images, sound recordings, videos, scientific data, and
The data collected from
many new data formats.
different applications require proper mechanism of
extracting
knowledge /information
from
large
repositories for better decision making. Knowledge
discovery in databases (KDD), often called data mining,
aims at the discovery of useful information from large
collections of data. The important reason that attracted a
great deal of attention in information technology the
discovery of useful information from large collections of
data industry towards field of “Data mining” is due to the
perception of “we are data rich but information poor”. There is
huge volume of data but we hardly able to turn them in to
useful information and knowledge for managerial decision
making in business
To generate information it requires massive collection
of
data.
The
data
can
be
simple
numerical figures and text documents, to more complex
information
such
as
spatial
data,
multimedia data, and hypertext documents. To take complete
advantage
of
data;
the
data
retrieval
is simply not enough, it requires a tool for automatic
summarization
of
data,
extraction
of
the
essence of information stored, and the discovery of patterns
in
raw
data.
With
the
enormous
amount of data stored in files, databases, and other
repositories,
it
is
increasingly
important,
to
©2012, AJCSIT, All Right Reserved.
68
develop powerful tool for analysis and interpretation of
such
data
and
for
the
extraction
of
interesting knowledge that could help in decision-making.
The
only
answer
to
all
above
is
‘Data
Mining’.
Data mining is the extraction of hidden predictive
information
from
large
databases;
it
is
a
powerful technology with great potential to help
organizations
focus
on
the
most
important
information in their data warehouses [1,2,3,4]. Data mining
tools
predict
future
trends
and
behaviors, helps organizations to make proactive
knowledge-driven
decisions
[2].
The automated,
prospective analyses offered by data mining move beyond
the analyses of past events provided by prospective tools
typical of decision support systems. Data mining tools can
answer the questions that traditionally were too time
consuming to resolve. They prepare databases for finding
hidden patterns, finding predictive information that experts
may miss because it lies outside their expectations.
Data mining, popularly known as Knowledge
Discovery
in
Databases
(KDD),
it
is
the
nontrivial extraction of implicit, previously unknown and
potentially
useful
information
from
data
in databases[3,5]. Though, data mining and knowledge
discovery
in
databases
(or
KDD)
are
frequently treated as synonyms, data mining is actually
part
of
the
knowledge
discovery
process[1,3,5].
1. The Data Mining Task :
Padhy et. al/ The Survey of Data Mining Applications And Feature Scope
The data mining tasks are of different types depending on
the use of data mining result the data mining tasks are
classified as[1,2]:
Exploratory Data Analysis:
It is simply exploring the data without any clear ideas of
what we are looking for. These techniques are interactive and
visual.
Descriptive Modeling:
It describe all the data, it includes models for overall
probability distribution of the data, partitioning of the pdimensional space into groups and models describing the
relationships between the variables.
3 Predictive Modeling:
This model permits the value of one variable to be predicted
from the known values of other variables.
4. Discovering Patterns and Rules:
It concern with pattern detection, the aim is spotting
fraudulent behavior by detecting regions of the space
defining the different types of transactions where the data
points significantly different from the rest.
5. Retrieval by Content:
It is finding pattern similar to the pattern of interest in the
data set. This task is most commonly used for text and
image data sets.
TYPES OF DATA MINING SYSTEM
Data mining systems can be categorized according to
various criteria the classification is as follows[3]:
1Classification of data mining systems according to the
type of data source mined:
This classification is according to the type of data handled o
such as spatial data, multimedia data, time-series data,
text data, World Wide Web, etc.
o
2 Classification of data mining systems according to the
o
data model:
This classification based on the data model involved such
as relational database, object-oriented database, data
warehouse, transactional database, etc.
3 Classification of data mining systems according to
the kind of knowledge discovered:
This classification based on the kind of knowledge
discovered or data mining functionalities, such as
characterization,
discrimination,
association,
classification, clustering, etc. Some systems tend to be
comprehensive systems offering several data mining
functionalities together.
4 Classification of data mining systems according to
mining techniques
used:
This classification is according to the data analysis
approach used such as machine learning, neural networks,
genetic algorithms, statistics, visualization, database
oriented or data warehouse-oriented, etc.
The classification can also take into account the degree of
user
interaction
involved
in
the data mining process such as query-driven systems,
interactive
exploratory
systems,
or
autonomous systems. A comprehensive system would
provide
a
wide
variety
of
data
mining techniques to fit different situations and options,
and
offer
different
degrees
of
user interaction.
Data Mining Methods
Some of the popular data mining methods are as
follows :
1.
Decision Trees and Rules
2.
Nonlinear Regression and Classification Methods
69
3.
4.
5.
Example-based Methods
Probabilistic Graphical Dependency Models
Relational Learning Models
We found these are some famous data mining
methods are broadly classified as: On-Line Analytical
Processing ,(OLAP), Classification, Clustering, Association
Rule Mining, Temporal Data Mining, Time Series Analysis,
Spatial Mining, Web Mining etc. These methods use
different types of algorithms and data. The data source can
be data warehouse, database, flat file or text file. The
algorithms may be Statistical Algorithms, Decision Tree
based, Nearest Neighbor, Neural Network based, Genetic
Algorithms based, Ruled based, Support Vector Machine
etc. The selection of data mining algorithm is mainly
depends on the type of data used for mining and the
expected outcome of the mining process. The domain
experts play a significant role in the selection of algorithm
for data mining.
A knowledge discovery (KD) process involves
preprocessing data, choosing a data-mining algorithm, and
post processing the mining results. There are very many
choices for each of these stages, and non-trivial
interactions between them. Therefore both novices and
data-mining specialists need assistance in knowledge
discovery processes. The Intelligent Discovery Assistants
[7] (IDA), helps users in applying valid knowledge
discovery processes. The IDA can provide users with three
benefits:
A systematic enumeration of valid knowledge discovery
processes;
Effective rankings of valid processes by different
criteria, which help to choose between the options;
An infrastructure for sharing knowledge, which leads to
network externalities.
Several other attempts have been made to automate this
process and design of a generalized data mining tool that
posse’s intelligence to select the data and data mining
algorithms and up to some extent the knowledge discovery.
Data Mining Application
Data mining enables the businesses to understand the
patterns hidden inside past purchase transactions, thus
helping in plan and launch new marketing campaigns in
prompt and cost effective way.
In this section, some of the applications of data mining
applications and its techniques are analyzed respectively
Order.
Data mining applications in healthcare
Data mining applications in health can have tremendous
potential and usefulness [60]. However, the success of
healthcare data mining hinges on the availability of clean
healthcare data. In this respect, it is critical that the
healthcare industry look into how data can be better
captured, stored, prepared and mined. Possible directions
include the standardization of clinical vocabulary and the
sharing of data across organizations to enhance the
benefits of healthcare data mining applications
1 Future Directions of Health care system through
Data Mining Tools
As healthcare data are not limited to just quantitative data
(e.g., doctor’s notes or clinical records), it is necessary to also
explore the use of text mining to expand the scope and nature
of what healthcare data mining can currently do. In
particular, it is useful to be able to integrate data and text
mining. It is also useful to look into how images (e.g., MRI
Padhy et. al/ The Survey of Data Mining Applications And Feature Scope
2.
scans) can be brought into healthcare data mining
applications. It is noted that progress has been made in these
areas
Data mining is used for market basket analysis
It provides insight information on what product
combinations were purchased, when they were bought and
in what sequence by customers. This information helps
businesses to promote their most profitable products to
maximize the profit. In addition, it encourages customers to
purchase related products that they may have been missed
or overlooked. Retails companies uses data mining to
identify customer’s behavior buying patterns.
3. The data mining is used an emerging trends in the
education system [57, 58] in the whole world
In Indian culture most of the parents are uneducated .The
main aim of in Indian government is the quality education
not for quantity. But the day by day the education systems
are changed and in the 21st century a huge number of
universalities are established by the order of UGC. As the
number of universities are established side by side, each
and every day a millennium of students are enrolls across
the country. With huge number of higher education
aspirants, we believe that data mining technology can help
bridging knowledge gap in higher educational systems. The
hidden patterns, associations, and anomalies that are
discovered by data mining techniques from educational
data can improve decision making processes in higher
educational systems. This improvement can bring
advantages such as maximizing educational system 
efficiency, decreasing student's drop-out rate, and
increasing student's promotion rate, increasing student's
retention rate in, increasing student's transition rate, 
increasing educational improvement ratio, increasing
student's success, increasing student's learning outcome, 
and reducing the cost of system processes. In this current
era we are using the KDD and the data mining tools for 
extracting the knowledge this knowledge can be used for
improving the quality of education .The decisions tree
classification is used in this type of applications.
Data mining is now used in many different areas in
manufacturing engineering [59]
To extract knowledge for use in predictive maintenance,
fault detection, design, production, quality assurance,
scheduling, and decision support systems. Data can be
analyzed to identify hidden patterns in the parameters that
control manufacturing processes or to determine and
improve the quality of products. A major advantage of data
mining is that the required data for analysis can be
collected during the normal operations of the
manufacturing process being studied and it is therefore
generally not necessary to introduce dedicated processes
for data collection. Since the importance of data mining in
manufacturing has clearly increased over the last 20 years,
it is now appropriate to critically review its history and
Application
Future Directions in the manufacturing Engineering
through the Data mining Tools
The manufacturing data-mining research often does not
consider the quality of the rules or knowledge discovered.
The knowledge generated is sometimes cumbersome and
the relationships obtained are too complex to understand.
Future research effort is therefore also needed to enhance
the expressiveness of the knowledge. The CRISP-DM
methodology provides high level step-by-step instructions
for applying data mining in engineering. Further research is
70
needed to develop generic guidelines for a variety of
different data and types of problems, which are commonly
faced by manufacturing engineering industry
The data mining applications can be generic or domain
specific.
The generic application is required to be an intelligent
system that by its own can takes certain decisions like:
selection of data, selection of data mining method,
presentation and interpretation of the result. Some generic
data mining applications cannot take its own these decisions
but guide users for selection of data, selection of data mining
method and for the interpretation of the results. The multi
agent based data mining application [8, 10] has capability of
automatic selection of data mining technique to be applied.
The Multi Agent System used at different levels [8]: First,
at the level of concept hierarchy definition then at the
result level to present the best adapted decision to the user.
This decision is stored in knowledge Base to use in a later
decision-making. Multi Agent System Tool used for generic
data mining system development [10] uses different agents
to perform different tasks.
The growth of the insurance industry is entirely depends on
the ability of converting data into the knowledge, information
or intelligence about customers, competitors and its markets.
Data mining is applied in insurance industry lately but brought
tremendous competitive advantages to the companies who
have implemented it successfully. The data mining applications
in insurance industry are listed below:
Data mining is applied in claims analysis such as
identifying which medical procedures
are
claimed together.
Data mining enables to forecasts which customers will
potentially purchase new
policies.
Data mining allows insurance companies to detect risky
customers’ behavior patterns.
Data mining helps detect fraudulent behavior
A multi-tier data mining system is proposed to enhance
the performance of the data mining process [9].
It has basic components like user interface, data mining
services, data access services and the data. There are three
different architectures presented for the data mining system
namely One-tire, Two-tire and Three-tire architecture.
Generic system required to integrate as many learning
algorithms
as
possible
and
decides
the
most appropriate algorithm to use. CORBA (Common Object
Request
Broker
Architecture)
has
features like: Integration of different applications coded in
any programming language considerably easy. It allows
reusability in a feasible way and finally it makes possible to
build large and scalable system. .
The data mining system architecture based on CORBA is
given
by
Object
Management
Group [10] has all characteristics to accomplish a distributed
and object oriented computation.
A data-centric focus and automated methodologies
makes data mining accessible to no experts [11]. The use
of high-level interfaces can implement the automated
methodologies that hide the data mining concepts away
from the users. A data-centric design hides away all the
details of mining methodology and exposes them through
high-level tasks that are goal-oriented. These goal-oriented
tasks are implemented using data-centric APIs. This
design makes data mining task like other types of queries
that users perform on the data.
Padhy et. al/ The Survey of Data Mining Applications And Feature Scope
In data mining better results could be obtained if large
data is available. It leads to the merging and linking of
local databases. A new data-mining architecture based
on Internet technology addressed this problem. [12]
The context factor plays vital role in the success of data
mining.
The
importance
and
meaning
of same data in the different context is different. A data in
one
context
is
very
important
may
not
be much important in other context. A context-aware datamining
framework
filters
useful
and
interesting context factors, and can produce accurate
and
precise
prediction
using
those
factors[46].
Data mining helps to determine the distribution
schedules among warehouses and outlets and analyze
loading patterns

Data mining enables to characterize patient
activities to see coming office visits.

Data mining help identify the patterns of successful
medical therapies for different illnesses.
The use of data mining techniques in banking domain
is suitable due to the nature and sensitivity of bank
data and due to the real time complex decision processes.
The main concern for a bank's manager is to take good
decisions in order to minimize the risks level associated to
bank's activities. It is very important for a bank to have
knowledge of causes which generate the financial crises or
imbalances. Lending is one of the most risky activities in
banking area and adequate methods to support the
decision making process are necessary. In this paper the
authors present a prototype decision support system based
on data mining techniques used in lending process. The
proposed system was designed to assist a customer who
applies for a credit and it may represent an extension for ebanking activities.[48]
Application of data mining techniques in CRM
It is an emerging trend in the industry. It has attracted the
attention of practitioners and academics. This paper has
identified eighty seven articles related to application of
data mining techniques in CRM, and published between
2000 and 2006. It aims to give a research summary on
the application of data mining in the CRM domain and
techniques which are most often used. Although this
review cannot claim to be exhaustive, it does provide
reasonable insights and shows the incidence of research on
this subject. The results presented in this paper have
several important implications: Research on the
application of data mining in CRM will increase
significantly in the future based on past publication rates
and the increasing interest in the area. The majority of the
reviewed articles relate to customer retention [49]
In language research and language engineering much
time extra linguistic information is needed about a text. A
linguistic profile that contains large number of linguistic
features can be generated from text file automatically using
data mining[14]. This technique found quite effective for
authorship verification and recognition. A profiling system
using combination of lexical and syntactic features shows
97% accuracy in selecting correct author for the text. The
linguistic profiling of text effectively used to control the
quality of language and for the automatic language
verification.[15] This method verifies automatically the text
is of native quality. The results show that language
verification is indeed possible.
In medical science there is large scope for application of
71
data mining. Diagnosis of diesis, health care, patient
profiling and history generation etc. are the few examples.
Mammography
is the method used in breast cancer
detection. Radiologists face lot of difficulties in detection of
tumors. Computer-aided methods could assist medical staff
and improve the accuracy of detection[16]. The neural
networks with back-propagation and association rule
mining used for tumor classification in mammograms. The
data mining effectively used in the diagnosis of lung
abnormality that may be cancerous or benign[17]. The data
mining algorithms significantly reduce patient’s risks and
diagnosis costs. Using the prediction algorithms the
observed prediction accuracy was 100% for 91.3% cases.
The use of data mining in health care is the widely used
application of data mining. The medical data is complex and
difficult to analyze. A REMIND (Reliable Extraction and
Meaningful Inference from Non-structured Data)
system[21] integrates the structured and unstructured
clinical data in patient records to automatically create high
quality structured clinical data. The high quality of
structuring allows existing patient records to be mined to
support guidelines compliance and to improve
patient
care.[21]
Data mining in distance learning
Data mining in distance learning automatically generate
useful information to enhance the learning process based
on the vast amount of data generated by the tutors and
student’s interactions with web based distance-learning
environment.[18] The Data Mining Applications transfers
the data into information and feedback to the e-learning
environment. This solution transforms large amounts of
useless data into an intelligent monitoring and
recommendation system applied to the learning process.
Data mining methods are used in the web Education
Data mining methods are used in the web Education are
used to improve courseware. The relationships are
discovered among the usage data picked up during
students’ sessions. This knowledge is very useful for the
teacher or the author of the course, who could decide
what modifications will be the most appropriate to improve
the effectiveness of the course. [42]
The data mining methods are also used to provide learners
adaptive
feedback
with
real-time
on the nature and patterns of their on-line communication
while learning collaboratively[41]. This makes it possible to
increase the awareness of learners. The application of data
mining methods to educational chats is both feasible and can
bring the improvement in learning environments.
Data mining facilitates software maintenance
engineers
Data mining facilitates software maintenance engineers to
comprehend the structure of a software system and assess its
maintainability.[24] The clustering algorithm effectively used
to produce overviews of systems by creating mutually
exclusive groups of classes, member data or methods,
according to their similarities and hence reduces the time
required to understand the overall system. This method also
helps in discovering programming patterns and “unusual” or
outlier cases which may require attention.
Credit scoring
Credit scoring has become very important issue due to the
recent growth of the credit industry, so the credit
department of the bank faces the huge numbers of
consumers' credit data to process, but it is impossible
analyzing this huge amount of data both in economic and
Padhy et. al/ The Survey of Data Mining Applications And Feature Scope
manpower terms. In this study we reviewed the papers
which have applied data mining methods in credit risk
evaluation problem. Ten data mining technique which were
most used method in the credit risk evaluation context
were extracted, and then we searched almost all papers
which had focused on these ten methods form 2000 to
2011. It is concluded that the support vector machine has
been widely applied in recent years. Since to improve the
performance of this model, it is necessary a method for
reduction the feature subset, many hybrid SVM based
model are proposed. Moreover the hybrid models have
been attended in the last decade because of its enjoying
from advantages of two or more models. Many of these
proposed models can only classify customers into two
classes “good” or “bad” ones.
Several single and hybrid data mining methods are applied
for credit scoring problem [50], [51], [53], [54], [55].The
most used applied methods for doing credit scoring task
are derived from classification technique. Classification can
involve any context in which some decision or forecast is
made on the basis of available information. It can be
defined as a method which classifies the members of a
given set of instances into some groups in terms of their
characteristics. Classification task is very suited to data
mining methods and techniques
The intrusion detection in the Network
The intrusion detection in the Network is very difficult and
needs a very close watch on the data traffic. The intrusion
detection plays an essential role in computer security. The
classification method of data mining is used to classify
the network traffic normal traffic or abnormal
traffic.[26]. If any TCP header does not belong to any of the
existing TCP header clusters, then it can be considered as
anomaly.
A malicious executable is threat
A malicious executable is threat to system’s security, it
damage a system or obtaining sensitive information
without the user’s permission. The data mining methods
used to accurately detect malicious executables before
they run[25]. Classification algorithms RIPPER, Naive
Bayes, and a Multi-Classifier system are used to detect new
malicious executables. This classifier had shown detection
rate 97.76%.
Sports data Mining :
The data mining and its technique is used for an
application of Sports center . Data mining is not only use in
the business purposes but also it used in the sports .In the
world , a huge number of games are available where each
and every day the national and international games are to
be scheduled ,where a huge number of data’s are to be
maintained .The data mining tools are applied to give the
information as and when we required . The open source
data mining tools like
EKA and RAPID MINER frequently
used for sport. This means that users can run their data
through one of the built-in algorithms, see what results
come out, and then run it through a different algorithm to
see if anything different stands out. Because of these
programs’ open source nature, users are free to modify the
source code, provided that the modifications are made
available to others.[56 ]
In the sports world the vast amounts of statistics are
collected for each player, team, game, and season. Data
mining can be used by sports organizations in the form of
statistical analysis, pattern discovery, as well as outcome
prediction. Patterns in the data are often helpful in the
forecast of future events. Data mining can be used for
scouting, prediction of performance, selection of players,
coaching and training and for the strategy planning[34]. The
data mining techniques are used to determine the best or the
most optimal squad to represent a team in a team sport in a
season, tour or game.[44] The ‘Cy Young Award’[30] has been
presented annually to the best pitcher in the major league of
baseball. The award is based largely on statistics compiled
over the course of the baseball season. A Bayesian classifier is
developed to predict Cy Young Award winners in American
major league baseball.
The Intelligence Agencies
The Intelligence Agencies collect and analyze information
to
investigate
terrorist
activities.
One challenge to law enforcement and intelligent agencies
is
the
difficulty
of
analyzing
large
volume of data involve in criminal and terrorist activities.
Data mining makes it easy, convenient and practical to
explore very large databases for organizations. The
different data mining techniques are used in crime data
mining.[19,33,37,43]
Entity
extraction
used
to
automatically identify person, address, vehicle, narcotic
drug, and personal properties from police narrative
reports. Clustering techniques used to automatically
associate different objects such as persons, organizations,
vehicles etc. in crime records. Deviation detection is
applied in fraud detection, network intrusion detection,
and other crime analyses that involve tracing abnormal
activities. Classification is used to detect email spamming
and find authors who send out unsolicited emails. String
comparator is used to detect deceptive information in
criminal record. Social network analysis used to analyze
criminals’ roles and associations among entities in a
criminal network.
The data mining system implemented at the Internal
Revenue Service
The data mining system implemented at the Internal
Revenue Service to identify high-income individuals engaged
in abusive tax shelters[23] show significantly good results.
The major lines of investigation included visualization of the
relationships and data mining to identify and rank possibly
abusive tax avoidance transactions.
To enhance the quality of product data mining techniques
can be used effectively. The data mining technology
SAS/EM is used to discover the rules those are unknown
before and it can improve the quality of products and
decrease the cost. A regression model and the neural network
model when applied for this purpose given accuracy above
80%.[31] The neural network model found better than the
regression model.
72
E-commerce is also the most prospective
E-commerce is also the most prospective domain for data
mining[39].
It
is
ideal
because
many of the ingredients required for successful data mining
are easily available: data records are plentiful, electronic
collection provides reliable data, insight can easily be
turned into action, and return on investment can be
measured. The integration of e-commerce and data
mining significantly improve the results and guide the
users in generating knowledge and making correct
business decisions. This integration effectively solves
several major problems associated with horizontal data
mining tools including the enormous effort required in preprocessing of the data before it can be used for mining, and
Padhy et. al/ The Survey of Data Mining Applications And Feature Scope
making the results of mining actionable.
The Digital Library retrieves,
The Digital Library retrieves collects, stores and preserves
the digital data. The advent of electronic resources and their
increased use in libraries has brought about significant
changes in Library[40]. The data and information are
available in the different formats. These formats include
Text, Images, Video, Audio, Picture, Maps, etc. therefore
digital library is a suitable domain for application of data
mining.
Retailers have been collecting
Retailers have been collecting large amount of data like
customer
information
and
related
transactions, product information etc. This significantly
improves the applications like product demand
forecasting,
assortment
optimization,
product
recommendation
and
assortment comparison across
retailers and manufacturers[22]. To update the product
details database is thus the main issue. The text mining
application for extraction of implicit attributes and
explicit attributes from product descriptions documents is
the main task in such applications. Naive Bayes and
Expectation-Maximization these two methods of data mining
are used in this context.
consumer information system data mining

In another application to design effective user
interfaces for consumer information system data mining
can be used effectively[35]. Consumers use compensatory
and non-compensatory
decision strategies when
formulating their purchasing decisions. Compensatory
decision-making strategies are used when the consumer
fully rationalizes their decision outcome whereas noncompensatory decision-making strategies are used
when the consumer considers only that information
which has most meaning to them at the time of decision.
These decision-making strategies are considered while
designing online shopping support tools, and
personalizing the design of the user interface. The data
mining methods cluster analysis and rough sets, are used to
obtain consumer information needed in support of
designing customizable and personalized user interface
enhancements.

Group work has an important role in many aspects
of life. This makes it important for people to learn to be
effective team members. Data mining is used for
identifying patterns that characterize successful groups
from less successful ones[20]. The data mining algorithms
are used that can properly account for the temporal
nature of the data and the character of group interaction.
There are two way processes involved, where theories of
effective group behavior can drive the data mining and, in
the opposite direction, that the data mining should provide
results that are meaningful to groups wishing to improve
their effectiveness. A frequent sequential pattern-mining
algorithm is used, which addresses the problem of
discovering frequent sequences in a database with a
minimum frequency called support.
Data mining is used in the internet
The Internet contains a large number of online
documents available thus required an automated text
and document classification systems that are capable of
automatically organizing and classifying documents. There
are several different data mining methods for text
classification, including statistical-based algorithms,
Bayesian classification, distance-based algorithms, k-
nearest neighbors, decision tree-based methods etc.[28]
Text classification techniques are used in many
applications on web, including e-mail filtering, mail
routing, Spam filtering, news monitoring, sorting
through digitized paper archives, automated indexing of
scientific articles, classification of news stories and
searching for interesting information on the WWW.
The pharmaceutical industry
The pharmaceutical industry is well known for performing
quantitative
analysis
for
clinical
research and market research[32]. In the marketing
departments data mining applications are used for sales
force planning and direct marketing to doctors and
consumers. Data mining techniques used quite well to a
variety of critical business decisions in the pharmaceutical
industry. It also used for forecasting production schedules
for the manufacturing plants, determining market potential
in critical go/no decisions on continuing work on
development compounds, or making financial projections for
stock holders and investors on Wall Street.
The prediction in engineering applications
The prediction in engineering applications was treated
effectively
by
a
data
mining
approach[17]. The prediction problems like the cost
estimation problem in engineering, the problem of
engineering design that involves decisions where parameters,
actions, components, and so on are selected. This selection
is often made based on prior data, information, or
knowledge. Numerous models and algorithms have been
developed for autonomous predictions based on data
corresponding to different characteristics. The data mining
algorithm applied on the test file with nine features has
produced 100% correct predictions. Several other
applications studied in this context.
The Scope of Data Mining
Data mining derives its name from the similarities between
searching for valuable business information in a large
database for example, finding linked products in gigabytes
of store scanner data and mining a mountain for a vein of
valuable ore. Both processes require either sifting through
an immense amount of material, or intelligently probing it
to find exactly where the value resides. Given databases of
sufficient size and quality, data mining technology can
generate new business opportunities by providing these
capabilities:
73
Automated prediction of trends and behaviors.
Data mining automates the process of finding predictive
information in large databases. Questions that traditionally
required extensive hands-on analysis can now be answered
directly from the data quickly. A typical example of a
predictive problem is targeted marketing. Data mining uses
data on past promotional mailings to identify the targets
most likely to maximize return on investment in future
mailings. Other predictive problems include forecasting
bankruptcy and other forms of default, and identifying
segments of a population likely to respond similarly to
given events.
Automated discovery of previously unknown patterns.
Data mining tools sweep through databases and identify
previously hidden patterns in one step. An example of
pattern discovery is the analysis of retail sales data to
identify seemingly unrelated products that are often
purchased together. Other pattern discovery problems
include detecting fraudulent credit card transactions and
Padhy et. al/ The Survey of Data Mining Applications And Feature Scope
identifying anomalous data that could represent data entry
keying errors.
Data mining techniques can yield the benefits of
automation on existing software and hardware platforms,
and can be implemented on new systems as existing
platforms are upgraded and new products developed.
When data mining tools are implemented on high
performance parallel processing systems, they can analyze
massive databases in minutes. Faster processing means
that users can automatically experiment with more models
to understand complex data. High speed makes it practical
for users to analyze huge quantities of data. Larger
databases, in turn, yield improved predictions.
Databases can be larger in both depth and breadth:
More columns. Analysts must often limit the number of
variables they examine when doing hands-on analysis due
to time constraints. Yet variables that are discarded
because they seem unimportant may carry information
about unknown patterns. High performance data mining
allows users to explore the full depth of a database, without
preselecting a subset of variables.
More rows. Larger samples yield lower estimation errors and
variance, and allow users to make inferences about small but
important segments of a population.
A recent Gartner Group Advanced Technology Research Note
listed data mining and artificial intelligence at the top of the
five key technology areas that "will clearly have a major impact
across a wide range of industries within the next 3 to 5 years."2
Gartner also listed parallel architectures and data mining as
two of the top 10 new technologies in which companies will
invest during the next 5 years. According to a recent Gartner
HPC Research Note, "With the rapid advance in data capture,
transmission and storage, large-systems users will increasingly
need to implement new and innovative ways to mine the aftermarket value of their vast stores of detail data, employing MPP
[massively parallel processing] systems to create new sources
of business advantage (0.9 probability).
The most commonly used techniques in data mining are:
•
Artificial neural networks:
Non-linear predictive models that learn through training
and resemble biological neural networks in structure.
•
Decision trees:
Tree-shaped structures that represent sets of decisions.
These decisions generate rules for the classification of a
dataset. Specific decision tree methods include
Classification and Regression Trees (CART) and Chi Square
Automatic Interaction Detection (CHAID) .
•
Genetic algorithms:
Optimization techniques that use process such as genetic
combination, mutation, and natural selection in a design
based on the concepts of evolution.
•
Nearest neighbor method:
A technique that classifies each record in a dataset based
on a combination of the classes of the k record(s) most
similar to it in a historical dataset (where k ³ 1). Sometimes
called the k-nearest neighbor technique.
•
Rule induction:
The extraction of useful if-then rules from data based on
statistical significance. Many of these technologies have
been in use for more than a decade in specialized analysis
tools that work with relatively small volumes of data. These
capabilities are now evolving to integrate directly with
industry-standard data warehouse and OLAP platforms. The
appendix to this white paper provides a glossary of data
mining terms.
74
CONCLUSION
In this paper we briefly reviewed the various data
mining applications. This review would be helpful to
researchers to focus on the various issues of data mining.
In future course, we will review the various
classification algorithms and significance of evolutionary
computing (genetic programming) approach in designing
of efficient classification algorithms for data mining. Most of
the previous studies on data mining applications in
various fields use the variety of data types range from text
to images and stores in variety of databases and data
structures. The different methods of data mining are used to
extract the patterns and thus the knowledge from this variety
databases. Selection of data and methods for data mining is
an important task in this process and needs the
knowledge of the domain. Several attempts have been
made to design and develop the generic data mining system
but no system found completely generic. Thus, for every
domain the domain expert’s assistant is mandatory. The
domain experts shall be guided by the system to effectively
apply their knowledge for the use of data mining systems
to generate required knowledge. The domain experts are
required to determine the variety of data that should be
collected in the specific problem domain, selection of
specific data for data mining, cleaning and transformation
of data, extracting patterns for knowledge generation
and finally interpretation of the patterns and knowledge
generation.
Most of the domain specific data mining applications
show
accuracy
above
90%.
The
generic data mining applications are having the limitations.
the
study
of
various
From
data mining applications it is observed that, no application
called
generic
application
is
100 % generic. The intelligent interfaces and intelligent
agents
up
to
some
extent
make
the application generic but have limitations. The domain
experts
play
important
role
in
the different stages of data mining. The decisions at different
stages
are
influenced
by
the
factors like domain and data details, aim of the data mining,
and
the
context
parameters.
The domain specific applications are aimed to extract
specific
knowledge.
The
domain
experts by considering the user’s requirements and other
context
parameters
guide
the
system. The results yield from the domain specific
applications
is
more
accurate
and
useful. Therefore it is conclude that the domain specific
applications
are
more
specific
for
data mining. From above study it seems very difficult to
design
and
develop
a
data
mining system, which can work dynamically for any domain.
REFERENCES
[1]
Introduction to Data Mining and Knowledge
Discovery, Third Edition ISBN: 1-892095-025, Two
Crows Corporation, 10500 Falls Road, Potomac, MD 20854
(U.S.A.), 1999.
[2]
Larose, D. T., “Discovering Knowledge in Data: An
Introduction to Data Mining”, ISBN 0-471-66657-2, ohn Wiley
& Sons, Inc, 2005.
[3]
Dunham, M. H., Sridhar S., “Data Mining:
Introductory and Advanced Topics”, Pearson Education,
New
Delhi, ISBN: 81-7758-785-4, 1st Edition, 2006
[4].
Chapman, P., Clinton, J., Kerber, R., Khabaza,
Padhy et. al/ The Survey of Data Mining Applications And Feature Scope
T.,Reinartz, T., Shearer, C. and Wirth, R..
“CRISP-DM 1.0 :
Step-by-step data mining guide, NCR Systems
Engineering Copenhagen
(USA and Denmark),
DaimlerChrysler AG (Germany), SPSS Inc. (USA)
and OHRA
Verzekeringenen Bank Group B.V (The
Netherlands), 2000”.
[5].
Fayyad, U., Piatetsky-Shapiro, G., and Smyth P.,
“From Data Mining to Knowledge
Discovery
in
Databases,” AI Magazine, American Association for Artificial
Intelligence, 1996.
[6].
Tan Pang-Ning, Steinbach, M., Vipin Kumar.
“Introduction to Data Mining”, Pearson Education, New
Delhi, ISBN: 978-81-317-1472-0, 3rd Edition, 2009.
[7]. Bernstein, A. and Provost, F., “An Intelligent Assistant
for the Knowledge Discovery
Process”, Working Paper
of the Center for Digital Economy Research, New York
University and also presented at the IJCAI
2001
Workshop on Wrappers for Performance Enhancement in
Knowledge Discovery in Databases.
[8].
Baazaoui, Z., H., Faiz, S., and Ben Ghezala, H., “A
Framework for Data Mining Based Multi- Agent:
An
Application to Spatial Data, volume 5, ISSN 1307-6884,”
Proceedings of World
Academy of Science, Engineering
and Technology, April 2005.
[9].
Rantzau, R. and Schwarz, H., “A Multi-Tier
Architecture
for
High-Performance
Data
Mining,A Technical Project Report of ESPRIT
project, The consortium of CRITIKAL project,
Attar
Software Ltd. (UK), Gehe AG (Denmark); Lloyds
TSB Group (UK), Parallel
Applications
Centre,
University of Southampton (UK), BWI, University of
Stuttgart
(Denmark), IPVR, University of Stuttgart
(Denmark)”.
[10]. Botia, J. A., Garijo, M. y Velasco, J. R., Skarmeta, A. F.,
design
and
“A Generic Data mining System basic
implementation guidelines”, A Technical Project Report
of
CYCYT
projectofSpanishGovernment.1998.WebSite:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi
=10.1.1.53.1935
[11] Campos, M. M., Stengard, P. J., Boriana, L. M., “DataCentric Automated Data Mining”, , Web
Site.:
www.oracle.com/technology/products/bi/odm/pdf/automa
ted_data_mining_paper_1205.pdf
[12].
Sirgo, J., Lopez, A., Janez, R., Blanco, R., Abajo, N.,
Tarrio, M., Perez, R., “A Data Mining
Engine based on
Internet, Emerging Technologies and Factory Automation,”
Proceedings
ETFA '03, IEEE Conference,
16-19
Sept.
2003. Web
Site
:
www.citeseerx.ist.psu.edu/viewdoc/summary?doi=1
0.1.1.11.8955
[13]
Bianca V. D.,Philippe Boula de Mareüil and Martine
Adda-Decker, “Identification of foreignaccented French
using data mining techniques, Computer Sciences Laboratory
for Mechanics and
Engineering Sciences (LIMSI)”.
Web
Site
www.limsi.fr/Individu/bianca/article/Vieru&Boula&
Madda_ParaLing07.pdf
[14]
Halteren, H. van, “Linguistic Profiling for Author
Recognition and Verification”, Proceedings
of
the42nd Annual Meeting on Association for Computational
Linguistics USA, Barcelona,
Spain, Article
No.
199, Year of Publication: 2004.
[15].
Halteren, H. V., Oostdijk N., “Linguistic profiling of
texts for the purpose of language verification, The ILK
75
research group, Tilburg centre for Creative Computing and
the
Department of Communication and Information
Sciences of the Faculty of Humanities,
Tilburg
University, The Netherlands.” Website:
www.ilk.uvt.nl/~antalb/textmining/LingProfColingD
ef.pdf
[16]. Antonie, M. L., Zaiane, O. R.,Coman, A., “Application
of Data Mining Techniques for Medical
Image
Classification”, Proceedings of the Second International
Workshop on Multimedia Data Mining (MDM/KDD 2001)
in conjunction with ACM SIGKDD conference, San
Francisco, August 26, 2001.
[17].
Kusiak, A., Kernstine, K.H., Kern, J.A., McLaughlin,
K.A., and Tseng, T.L., “Data Mining:
Medical And
Engineering Case Studies”. Proceedings of the Industrial
Engineering Research 2000 Conference, Cleveland, Ohio,
pp. 1-7,May 21-23, 2000.
18]. Luis, R., Redol, J., Simoes, D., Horta, N., “Data
Warehousing and Data Mining System Applied to ELearning,
Proceedings
of
the
II
International
Conference on Multimedia and
Information
&
Communication Technologies in Education, Badajoz,
Spain, December 3-6th 2003.
[19].
Chen, H., Chung, W., Qin, Y., Chau, M., Xu, J. J., Wang,
G., Zheng, R., Atabakhsh, H., “Crime
Data Mining: An
Overview and Case Studies”, A project under NSF Digital
Government
Programme, USA, “COPLINK Center:
Information and Knowledge Management for Law
Enforcement,”, July 2000 -June 2003.
[20]
.Kay, J., Maisonneuve, N., Yacef, K., Zaiane O.,
“Mining patterns of events in students’
teamwork data”,
Proceedings of the ITS (Intelligent Tutoring Systems) 2006
Workshop on Educational Data Mining, pages 45-52,
Jhongli, Taiwan, 2006.
[21]
Rao, R. B., Krishnan, S. and Niculescu, R. S., “Data
Mining for Improved Cardiac Care” ,
SIGKDD
Explorations Volume 8, Issue 1.
[22]
Ghani, R., Probst, K., Liu, Y., Krema, M., Fano, A.,
“Text Mining for Product Attribute
Extraction”,
SIGKDD Explorations Volume 8, Issue 1.
[23]
DeBarr, D., Eyler-Walker, Z., “Closing the Gap:
Automated Screening of Tax Returns to Identify
Egregious Tax Shelters”. SIGKDD Explorations Volume 8, Issue
1.
[24] Kanellopoulos, Y., Dimopulos, T., Tjortjis, C., Makris,
Comprehendi
C. “Mining Source Code Elementsfor
ng Object-Oriented Systems and Evaluating Their
Maintainability”, SIGKDD Explorations Volume 8,
Issue 1.
[25]
Schultz, M. G., Eskin, Eleazar, Zadok, Erez, and
Stolfo, Salvatore, J., “Data Mining
Methods
for
Detection of New Malicious Executables”.
Proceedings of the 2001 IEEE
Symposium on Security
And Privacy, IEEE Computer Society Washington,
DC, USA ,
ISSN:1081-6011, 2001.
[26] Cai, W. and Li L., “Anomaly Detection using TCP
Header Information, STAT753 Class
Project
Paper,
May
2004.”.
Web
Site:http://www.scs.gmu.edu/~wcai/stat753/stat753report.
pdf.
[27]
.Nandi, T., Rao, C. B. and Ramchandran, S.,
“Comparative genomics using data mining tools, Journal
of Bio-Science, Indian Academy of Sciences, Vol.
27,No. 1, Suppl. 1, page No. 15-25, February 2002”.
[28]
Khreisat, L., “Arabic Text Classification Using N-
Padhy et. al/ The Survey of Data Mining Applications And Feature Scope
Gram Frequency Statistics A Comparative Study”.
roceedings of The 2006 International Conference on Data
Mining, DMIN'06, pp
78-82, Las Vegas, Nevada, USA,
June 26-29, 2006.
[29] Onkamo, P. and Toivonen, H., “A survey of data
mining methods for linkage disequilibrium
mapping”, Henry Stewart Publications 1473 - 9542. Human
Genomics. VOL 2, NO
5, Page No. 336-340, MARCH 2006.
[30]
Smith, L., Lipscomb, B., and Simkins, A., “Data
Mining in Sports: Predicting Cy Young Award Winners”.
Journal of Computer Science, Vol. 22, Page No. 115-121,April
2007.
[31]
Deng, B., Liu, X., “Data Mining in Quality
Improvement”. Proceedings of the Twenty-Seventh Annual
SAS Users Group International Conference 2002 by SAS
Institute Inc., Cary, NC, USA.
ISBN 1- 59047- 061-3.
Web
Site
:http://www2.sas.com/proceedings/sugi27/Proceed27.pdf
[32]
Cohen, J. J., Olivia, C., Rud, P., “Data Mining of Market
Knowledge in The Pharmaceutical
Industry”.
Proceeding of 13th Annual Conference of North-East SAS
Users Group Inc.,
NESUG2000,
Philadelphia
Pennsylvania, September 24-26 2000.
[33]
Elovici, Y., Kandel, A., Last, M., Shapira, B.,
Zaafrany, O., “Using Data Mining Techniques for
Detecting Terror-Related Activities
on
the
Web”. Web
Site:www.ise.bgu.ac.il/faculty/mlast/papers/JIW_Pa
per.pdf
[34] Solieman, O. K., “Data Mining in Sports: A Research
Overview, A Technical Report, MIS
Masters
Project,
August
2006”.
Web
Site:
http://ai.arizona.edu/hchen/chencourse/OsamaDM_in_Sports.pdf
[35]
Maciag, T., Hepting, D. H., Slezak, D., Hilderman, R. J.,
“Mining Associations for Interface Design”. Lecture Notes in
Computer Science, Springer Berlin / Heidelberg, Volume
4481, pp.
109-117, June 26, 2007.
[36]
Foster, D. P. and Stine, R. A., “Variable Selection in
Data Mining: Building a Predictive Model for Bankruptcy”.
Journal of the American Statistical Association,
Alexandria, VA, ETATS- UNIS, vol. 99, ISSN 0162-1459, pp.
303-313 January 15, 2004
[37]
Kraft, M. R., Desouza, K. C., Androwich, I., “Data
Mining in Healthcare Information Systems:
Case
Study of a Veterans’ Administration Spinal Cord
Injury Population”. IEEE,
Proceedings of the 36th
Hawaii International Conference on System Sciences, 0-76951874- 5/03, 2002.
[38]
Kusiak, A., Kernstine, K. H., Kern, J. A., McLaughlin, K.
A., and Tseng, T. L., “Data Mining: Medical and Engineering
Case Studies”. Proceedings of the Industrial Engineering
Research
2000 Conference, Cleveland, Ohio, pp. 17,May 21-23, 2000.
[39] Ansari, S., Kohavi, R., Mason, L., and Zheng, Z.,
“Integrating E-Commerce and Data
Mining:
Architecture
and Challenges”. Proceedings
of
IEEE
International Conference on
Data Mining, 2001.
[40]
Jadhav, S. R., and Kumbargoudar, P., “Multimedia
Data Mining in Digital Libraries: Standards
and
Features”. Proceedings of conference Recent advances in
Information Science and Technology READIT 2007, pp 5459, Organized by Madras Library Association - Kalpakkam
Chapter & Scientific information Resource Division,
Indira Gandhi Center for Atomic research, Department of
Atomic Energy, Kalpakkam, Tamilnadu,India. 12-13 July
2007.
[41]
Anjewierden, A., Koll¨offel, B., and Hulshof C.,
“Towards educational data mining: Using data
mining
methods for automated chat analysis to understand
and support inquiry learning
processes”. International
Workshop on Applying Data Mining in e-Learning,
ADML'07, Vol- 305, Page No 23-32Sissi,LassithiCrete
Greece, 18 September, 2007.
[42]
Romero, C., Ventura, S. and De-Bra, P.
“Knowledge Discovery with Genetic
Programming
for Providing Feedback to Courseware Authors, Kluwer
Academic
Publishers, Printed in the Netherlands,
30/08/2004”.
[43]
Chen, H., Chung, W., Xu Jennifer, J., Wang, G., Qin, Y.,
Chau, M., “Crime Data Mining: A General Framework and
Some Examples”. Technical Report, Published by the IEEE
Computer Society, 0018-9162/04, pp 50-56, April
2004.
[44]
Chodavarapu Y., “Using data-mining for effective
(optimal)
sports
squad
selections”.
Web
Site:http://insightory.com/view/74//using_datamining_for_effective_(optimal)_sports_squad_selecti
ons
to
[45]
Jensen,
Christian,
S.,“Introduction
Temporal
Database
Research,”
Web
site:http://www.cs.aau.dk/~csj/Thesis/pdf/cha
pter1pdf
[46]
Vajirkar, P., Singh, S., and Lee, Y., “Context-Aware
Data Mining Framework for Wireless
Medical
Application”. Lecture Notes in Computer Science (LNCS),
Volume 2736, Springer- Verlag. ISBN 3-540-40806-1, pp.
381 - 391.
[48]
Industrial
Engineering
and
Engineering
Management, 2007 IEEE International Conference , ISBN:
978-1-4244-1529-8Print
ISBN: 978-1-42441529-8 INSPEC Accession Number:
9822324
Expert Systems with Applications 36 (2009)
[49]
2592-2602 , www.elsevier.com/
[50]
Neelamadhab&
Rasmita
“Data
warehousing and OAPL,MRDM
technology In
the decision support system in the21st century” , VSRD
Technical Journal
[51].
Zurada, J., and Lonial, S., 2005, "Comparison of the
Performance of Several Data Mining
Methods for Bad
Debt Recovery in the Healthcare Industry." the Journal of
Applied Business
Research, 21(2), 37-53.
[52].
Chye Koh, H., Chin Tan, W., and Peng Goh, C., 2006,
"A Two-step Method to Construct
Credit
Scoring
Models with Data Mining Techniques." Journal of
Business and Information, 1, 96-118.
[53].
Kirkos, E., Spathis, C., and Manolopoulos., Y., 2007,
"Data Mining techniques for the detection of fraudulent
financial statements." Expert Systems with Applications
32(4), 9951003.
[54].
Atish P, S., and Huimin, Z., 2008, "Incorporating
domain knowledge into data mining
classifiers:
An
application in indirect lending." Decision Support Systems
46(1), 287–299.
[55]. Yeh, I. C., and Lien, C. h., 2009, "The comparisons of
data mining techniques for the predictive accuracy
of
probability of default of credit card clients." Expert Systems
with Applications
36(2), 2473–2480.
[56]
Robert P. Schumaker
,Osama K. Solieman
,Hsinchun Chen ,Springer
76
Padhy et. al/ The Survey of Data Mining Applications And Feature Scope
[57] Educational Data Mining: An Emerging Trends in
Education , International Journal of Advanced
[58] A comparative Study for Predicting Student’s
performance , Surjeet Kumar Yadav 1, Brijesh
Bharadwaj2, Saurabh Pal3, ,International Journal of
Innovative technology and creative
engineering
(ISSN:2045-711) VOL.1 NO.12 DECEMBER 2011
[59] J. A. Harding , M. Shahbaz, Srinivas, A. Kusiak Journal
Research NOVEMBER
in Computer Science
of Manufacturing Science and Engineering
2006, Vol. 128 / 969
[60] HIAN CHYE KOH , School of Business, SIM University,
Singapore
77