Download Data Mining With Big Data

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
International Journal On Advanced Computer Theory And Engineering (IJACTE)
________________________________________________________________________________________________
Data Mining With Big Data: A Survey
Subik Pokharel
Information Technology, Sikkim Manipal Institute of Technology, Sikkim (INDIA)
Email: subikpokharel93@gmail.com
Abstract— Big Data is a new term for a large and
complicated data set that it becomes difficult to process
using a traditional data management tools. Big Data are
now growing rapidly in all science and engineering
domains, including biological, physical and biomedical
sciences. Big Data mining is the art of extracting useful
information from these data sets that was not possible
before due to its volume, variability, and velocity. The Big
Data is becoming one of the most exciting and challenging
opportunities for the next coming years.
structures, from large amounts of data stored in
databases, data warehouses or other information
repositories. In modern business to transform data into
business intelligence giving an informational advantage,
data mining is seen as an important tool. In Data Mining
Association Rule Mining, Sequential Pattern Mining,
Clustering, and Classification are the various techniques
that are used. Different algorithms are developed for
each of these techniques.
III. BIG DATA
Keywords— Big Data, Big Data mining, Data mining
I. INTRODUCTION
In the last few years we have witnessed a technology
revolution which has been facilitating millions of people
by generating tremendous data from various sensors,
devices, in different formats and from independent or
connected applications. These tremendous data is
referred as Big Data. With the help of Big Data many
impossible things such as preventing disease spreading,
crime, personalizing healthcare, quickly identifying
business opportunities, and protection of home land and
so on are possible. As discussed by the Economist [2]
“Managed well, the data can be used to unlock new
sources of economic value, provide fresh insights into
science and hold governments to accounts”. Each day
there are billions and trillions of data generated in each
fields, for example Google nearly processes 1 billion
queries per day, Twitter has nearly 250 million tweets
per day, YouTube has more than 4 billion views per day
and the data in each places is nearly similar. The DBMS
(once very successful) are no longer being able to meet
the increasing demands of Big Data. Due to these
challenges, call for new stack of high scalable
computing models, tools, frameworks and platforms,
etc, are required. Data mining has opened many new
challenges and opportunities for mining Big Data.
II. DATA MINING
Data mining which is said to be a branch of computer
science and artificial intelligence is a process of
discovering interesting knowledge, such as patterns,
associations, changes, anomalies and significant
Big data is the term for data sets so large and
complicated that it becomes difficult to process using
traditional data management tools or processing
applications [4].
There are two types of big data:
Structured data and unstructured data

Structured data are those data that are numbers
and words and can be easily categorized and
analyzed. Some examples from where these data
are generated are network sensors embedded in
electronic devices, smart phones, and global
positioning system (GPS) devices. It also includes
data like sales figures, account balances, and
transaction data.

Unstructured data are those data that contains
more complex information, such as customer
reviews from commercial websites, photos and
other multimedia, and comments on social
networking sites. These data cannot easily be
separated into categories or analyzed numerically.
In the year 1998 the term „Big Data‟ appeared for the
first time in a Silicon Graphics (SGI) slide deck by John
Mashey with the title of "Big Data and the Next Wave of
InfraStress". Big Data mining was very relevant from
the beginning, as the first book mentioning „Big Data‟
is a data mining book that appeared also in 1998 by
Weiss and Indrukya [6] .However, the first academic
_______________________________________________________________________________________________
ISSN (Print): 2319-2526, Volume -4, Issue -3, 2015
7
International Journal On Advanced Computer Theory And Engineering (IJACTE)
________________________________________________________________________________________________
paper with the words 'Big Data' in the title appeared a
bit later in 2000 in a paper by Diebold [7].
Doug Laney was the first person talking about the 3V‟s
in Big Data Management which is as follows:

Volume: It refers to the amount of data. Its size
increases continuously every day. It varies from
terabytes to zettabytes.

Variety: It refers to different types of data and data
sources such as text, images, blogs, video, sensor
data, etc.

Velocity: It refers to data in motion. Data arrives
continuously as a stream of data at high-speed and
processed to meet demands and the challenges
which lie ahead in the path of growth and
development.
Nowadays, there are two more V‟s:

Variability: It refers to as the change in structure
of the data as per required by the user and how they
want to interrupt the data.

Value: It refers to the data that are being used is
valuable to our society or not.
There are many applications of Big Data, for example:
Business, Technology, and Health, Smart cities, online
transaction, Education, etc.
Some of the features of Big Data are:

The size of Big Data is huge.

The data keeps on increasing and as well as
changing from time to time.

The data sources are from different sources.

It is hard to handle as it is complex in nature.

It is free from the influence, guidance or control of
anyone.
the decision is to be made more rapidly as it is a
competition era. One of the solutions to it is the
hardware. Some are using increased memory and
powerful processing to crunch large volumes of data
quickly.
B.
Understanding the data: Big Data refers to huge
amount of data. So to work on such types of data it is
very important to understand it and direct it in the right
shape so as to perform mining operations. For example,
if the data comes from social media content, we need to
know who the user is in a general way.
C.
Displaying meaningful results: Performing data
mining in Big Data, we get some hidden information or
some patterns. Plotting points on a graph from this
information becomes very difficult when dealing with
Big Data. Therefore grouping huge data into smaller
groups can be helpful.
D.
Privacy: Big Data contains huge amount of
data. These data are not usually stored in same places.
Hence for mining purpose these data needs to be
transported from one place to another. Therefore privacy
plays an important role during Big Data mining.
Presently, to mine information from Big Data, parallel
computing based algorithms such as MapReduce are
used.
V. FORECAST TO THE FUTURE
Since the era of petabyte is almost at its end and we are
entering in the era of Exabyte, the data plays a vital role
in making decisions in the near future. In the coming
years, the challenges in Big Data will also increase as
the data will increase. Following are some of the
challenges that researchers may have to deal during the
next few years:

Since the optimal architecture of an analytical
system is still unclear on dealing with historic data
and with real-time data at the same time. Lambda
architecture is an interesting architecture proposed
by Nathan Marz which solves computing arbitrary
functions problem on arbitrary data in real time.

Big Data has huge amount of data, so it is
important to achieve statistical significance and not
be flooded by randomness.

Data mining technique is used for extracting
patterns or hidden information from the Big Data
and many of these techniques are not trivial to
paralyze. Hence lot of research is needed in these
fields.

Since the technologies are growing rapidly, hence
researches should also be done in these fields as
well.
IV. CHALLENGES IN BIG DATA
Data visualization is becoming an increasingly
important component of analytics in the age of big data.
There are many challenges that must be addressed to
realize the full potential of Big Data. Meeting these
challenges presented by Big Data will be difficult. Some
of the challenges in Big Data are given below:
A.
Meeting the need for speed: In today‟s era, we
not only have to find and analyze the data but also must
find it quickly. For example, in case of an organization
_______________________________________________________________________________________________
ISSN (Print): 2319-2526, Volume -4, Issue -3, 2015
8
International Journal On Advanced Computer Theory And Engineering (IJACTE)
________________________________________________________________________________________________

Big Data deals with huge amount of data stored in
different places or warehouses and the amount of
data is increasing day by day. Hence to store these
data, compression is a important factor.
World Congress of the Econometric Society,
2000.
[8]
Rohit Pitre and Vijay Kolekar, “A Survey Paper
on Data Mining With Big Data,” International
Journal of Innovative Research in Advanced
Engineering (IJIRAE)Volume 1 Issue 1 (April
2014).
[9]
Manika Verma and Dr. Devarshi Mehta, “A
Comparative study of Techniques in Data
Mining,” International Journal of Emerging
Technology and Advanced Engineering, Volume
4, Issue 4, April 2014.
[10]
A.N.Pathak,
Manu
Sehgal
and
Divya
Christopher, “A Study on Selective Data Mining
Algorithms,” International Journal of Computer
Science, Issues, Vol. 8, Issue 2, March 2011.
[11]
Bharti Thakur and Manish Mann, “Data Mining
for Big Data: A Review,” International Journal of
Advanced Research in Computer Science and
Software Engineering, Volume 4, Issue 5, May
2014.
[12]
Dunren Che, Mejdl Safran and Zhiyong Peng,
“From Big Data to Big Data Mining: Challenges,
Issues, and Opportunities,” Springer Berlin
Heidelberg, 2013.
[13]
Juha K. Laurila, Daniel Gatica-Perez, Imad Aad,
Jan Blom, Olivier Bornet, Trinh-Minh-Tri Do,
Olivier Dousse, Julien Eberle and Markus
Miettinen, “The Mobile Data Challenge: Big
Data for Mobile Computing Research,”
unknown.
[14]
F. Diebold. On the Origin(s) and Development
of the Term "Big Data". Pier working paper
archive, Penn Institute for Economic Research,
Department of Economics, University of
Pennsylvania, 2012.
Subana Shanmuganathan, “FROM DATA
MINING AND KNOWLEDGE DISCOVERY
TO
BIG
DATA
ANALYTICS
AND
KNOWLEDGE
EXTRACTION
FOR
APPLICATIONS IN SCIENCE,” Journal of
Computer Science, 2014.
[15]
S. M. Weiss and N. Indurkhya. Predictive data
mining: a practical guide. Morgan Kaufmann
Publishers Inc., San Francisco, CA, USA, 1998.
Vitthal Yenkar and Prof.Mahip Bartere, “Review
on Data Mining with Big Data,” IJCSMC, Vol. 3,
Issue. 4, April 2014, pg.97 – 102.
[16]
Manish Kumar Kakhani, Sweeti Kakhani and
S.R. Biradar, “Research Issues in Big Data
Analytics,” International Journal of Application
or Innovation in Engineering & Management
(IJAIEM), Volume 2, Issue 8, August 2013.
VI. CONCLUSION
Big Data is going to be more diverse, larger and faster in
coming years. This paper discussed about the term „Big
Data‟, its challenges and its forecast to the future. The
coming years is going to be a challenge for the
researchers working on „Big Data‟, as well as for the
organizations.
ACKNOWLEDGMENT
I would like to thank Mr. Ashis Datta and Mr. Joyashri
Datta, for their support and guidance throughout this
period. I would always be thankful for their support.
Without their guidance, it wouldn‟t have been possible.
REFERENCES
[1]
Elisa Bertino, “Big Data – Opportunities and
Challenges,” IEEE 37th Annual Computer
Software and Applications Conference, 2013.
[2]
“Data, data everywhere”, The Economist, 25
February
2010,
available
at
http://www.economist.com/node/15557443.
[3]
Wei Fan and Albert Bifet, “Mining Big Data:
Current Status, and Forecast to the Future,” Vol.
14, Issue 2, 2013.
[4]
Bo Li, “Survey of Recent Research Progress and
Issues in Big Data,” December10,2013, avilable
at
http://www.cse.wustl.edu/~jain/cse57013/index.html.
[5]
[6]
[7]
F. Diebold. "Big Data" Dynamic Factor Models
for
Macroeconomic
Measurement
and
Forecasting. Discussion Read to the Eighth

_______________________________________________________________________________________________
ISSN (Print): 2319-2526, Volume -4, Issue -3, 2015
9