Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
ISSN 2319-7080 International Journal of Computer Science and Communication Engineering Volume 5 issue 1(February 2016 issue) ANALYSIS OF DATA MINING TRENDS, APPLICATIONS, BENEFITS AND ISSUES Dinesh Bhardwaj1, Sunil Mahajan2 1,2 Assistant Professor Department of Computer Science & Information Technology, SSM College, Dinanagar, (Punjab) India 1 dkbh28@gmail.com 2sanunil2003@gmail.com 1,2 ABSTRACT: In recent times Information Technology acting a very important role in every aspects of the human life. It is very essential to gather data from different sources. This data can be stored and maintained to generate information and knowledge. Data mining has become an essential factor in various fields including business, education, health care, finance, scientific etc. Data mining is part of the knowledge discovery process that offers a new way to look at data. Knowledge Discovery in Databases is the process of finding knowledge in massive amount of data where data mining is the core of this process. Data mining can be used to mine understandable meaningful patterns from large databases and these patterns may then be converted into knowledge. Data mining works with data warehouse and the whole process is divided into action plan to be performed on data: Selection, transformation, mining and results interpretation. In this paper, we have reviewed different types applications in data mining, also explains different areas where used data mining concept and issues of it. disadvantages as well such as privacy, security and misuse of information. This paper also discuses data mining techniques like prediction modeling etc, data minig tools, applications and trends in data mining , trends in data mining, major issues in data minig , notable uses of data mining and Conclusion. II. KNOWLEDGE DISCOVERY PROCESS The various processes are: Data cleaning: Remove noise that is unwanted data. Data Integration: Integration means combining multiple data sources. Data selection: Select related data to task from database. KEYWORDS: Data Mining, Knowledge discovery, Trends in data mining. I. INTRODUCTION The storing information in a data warehouse does not provide the benefits an organization is seeking. There are a number of features to this definition: data mining is concerned with the discovery of hidden, unexpected patterns of data. Data mining is the process of extracting previously unknown data from large databases and using it to make organizational decisions [1].data mining began its life in specialist applications such as geological research and meteorological research. More recently it has been applied in a number of areas of industry and commerce [2]. To generate information massive collection of data is required. The data can be simple like numerical data, figures and text documents, to more complex such as spatial data, multimedia data and hypertext documents. With large amount of data stored in databases, files, and other repositories, it is increasingly important, to develop powerful tool for analysis and interpretation of such data and for the extraction of interesting knowledge and patterns that could help in Decision making. Data mining is a set of activities or tool used to find new, hidden or unexpected patterns in data or unusual patterns in data. [3]. Data mining brings a lot of advantages when using in specific areas. Besides advantages, data mining also has its own Fig 1: Data mining process [4] Data transformation: Convert the data into appropriate form that will be easy to mine. Data mining: a process such as association, regression, classification to extract data patterns. Pattern Evaluation: Evaluate the output of data mining process and identify the interesting measures. Knowledge Representation: Various techniques are used to present the mined data to the user [5] www.ijcsce.org 53 ISSN 2319-7080 International Journal of Computer Science and Communication Engineering Volume 5 issue 1(February 2016 issue) Fig 2: Steps in Data Mining process [6] III. ARCHITECTURE OF DATA MINING Data mining is described as a process of discover or extracting interesting knowledge from large amounts of data stored in multiple data sources such as file systems, databases, data warehouse etc. This knowledge contributes a lot of benefits to business strategies, scientific, medical research, governments and individual. The architecture contains modules for secure safe-thread communication, database connectivity, organized data management and efficient data analysis for generating global mining model [7]. customers and predicting the kinds of customer best respond to new loan offered by the backs. . 2) Marketing: Data mining facilitates marketing sector by classifying customer demographic that can be used to predict which customer will respond to a mailing or buy a particular product and it is very much helpful in growth of business. 3) Health-Care: Data mining supports a lot in health care sector. It supports health care sector by correlating demographics of patients with critical illnesses, developing better insights on symptoms and their causes and learning how to provide proper treatments 4) Insurance: Data mining assist insurance sector in predicting fraudulent claims and medical coverage cost, classifying the important factors that affect medical coverage and predicting the customers’ pattern which customer will buy new policies [9]. 2. Disadvantages of Data Mining The disadvantages of data mining are explained as follows [10]: 1) Privacy Issues One of the disadvantages is a personal privacy issue. In recent years, with the boom of internet, the concerns about privacy have increased tremendously. Because of this privacy concern, individuals like internet users, employees, customers are afraid that unknown person may have access to their personal information and then use that information in an unethical way and this may cause harm to them. 2) Security Issues Another biggest disadvantage is security issue which is always a major concern in information technology. Companies have a lot of personal information about the employees and customers including social security number, birthdates, payroll etc., and it is also available in online. But, they do not have sufficient security systems in place to protect this information. They have been a lot of cases where hackers access and stole personal data of customers [10] V. CHALLENGES OF DATA MINING Fig 3: Architecture of data mining [7] IV. DATA MINING ADVANTAGES AND DISADVANTAGES 1. Advantages: Advantages of using data mining in various applications such as Banking, Manufacturing and production, marketing, health care etc., are as follows[8]: 1) Banking: Data mining supports banking sector in the process of searching a large database to discover previously unknown patterns; automate the process of finding predictive information. Data mining helps to forecast levels of bad loans and fraudulent credit cards use, predicting credit card spending by new There are many challenges faced by the data mining and these challenges of data mining are pointed as follows[11]: Scalability Complex and Heterogeneous Data Data Quality Data Ownership and Distribution Dimensionality Privacy preservation [12]. VI. DATA MINING TECHNIUES Data mining techniques and methods used in the main related disciplines and technologies from the following areas [13]: (1)Statistical Methods www.ijcsce.org 54 ISSN 2319-7080 International Journal of Computer Science and Communication Engineering Volume 5 issue 1(February 2016 issue) In data mining often involves a certain degree of statistical process, as data sample and modeling to determine assumptions and error control. Including descriptive statistics, probability theory, regression analysis, time series, including many of the statistical methods, data mining plays an important role. (2) Decision Tree Decision tree method is mainly used for data classification. Generally divided into two stages; The tree structure and tree pruning. Firstly, the training data to generate a test function, according to different Classification based on decision tree classification method in comparison with the other, with faster, more easily into simple and easy to understand classification rules, easily converted into database queries advantages, especially in problem areas of high dimension can be very good classification results. (3)Neural Network Artificial neural network structure mimic biological god the network is trained to learn through the nonlinear prediction model, in data mining can be used to carry out sub-class, clustering, feature extraction and other operations. (4)Genetic Algorithm Genetic algorithm is an optimization technique, which uses students’ evolution of the concept of property issues a series of search and finally optimized. Implementation of genetic algorithm, the first code for solving problems (called chromosomes), generates the initial population and then calculate the individual fitness, and then chromosome replication, exchange, mutation operation, generate new individuals. Repeat this exercise for, until the individual seeking the best or better. In data mining, data mining tasks tend to express as a search problems, use the powerful search capability of genetic algorithm to find the optimal solution. (6). Fuzzy Set Fuzzy sets is that the uncertainty of data and processing of important ways. Degree of membership of fuzzy set theory to describe the difference with the medium transition is a language with a precise mathematical fuzziness described method [14]. Fuzzy sets can not only deal with incomplete data, noise or imprecise data, but also in development of data uncertainty models can provide a more agile than traditional methods, smoother performance [15]. only. In addition to that some may work in only one database type. But, Most of the software will be able to handle any data using online analytical processing or a similar technology [16]. B. Dashboards Dashboards reflect data changed and update on screen. Dashboards is normally installed in computers to monitor information in a database and it reflects data changes and updates the data in the form of a chart or table on the screen. It enables the user to see how the business is performing. Historical data can be referenced and checks against the current status in order to see the changes in the business. By this way, dashboards is very easy to use and helps the manager a lot with great appeal to have an overview of the company’s performance. C. Text-Mining Tools The third type of data mining tools is called as a text-mining tool because of its ability to mine data from different kind of text starting from Microsoft Word, Acrobat PDF documents to simple text files. This provides facility of scanning the content and converts the selected into a format that is compatible with the tools database without opening different applications[17] Current open tools: These are following open sources tools[18] B. Weka Weka is a java based software capability of working under various operating systems and contains tools for data preprocessing, classification, regression, clustering, association rules and visualization. The algorithms can either be applied directly to a dataset or called from a user’s java code[19]. C. Orange Orange is an open source data mining and visualisation software with active community and which helps novice and experts for their analysis. It has the ability to work under various platforms like windows, Mac Os C and GNU/Linux operating systems and it’s packed with data analytics features. It enables design of data analysis process through user friendly visual programming or python scripting. Hence, this can be used as a scripting language for respective tasks of data mining. It represents most major algorithms for data mining and contains different visualisation, from scatter plots, bar charts, trees to dendrograms, networks and heatmaps It has specialised add-ons like Bioorange for bio informatics [20]. VII. DATA MINING TOOLS VIII. APPLICATIONS IN DATA MINING A. Categories of Data Mining Tools Most of the data mining tools can be classified into three categories: Traditional data mining tools, dash boards and textmining tools[16]: A. Traditional Data Mining Tools Traditional mining programs help the companies to establish data patterns and trends by using various complex algorithms and techniques. Some of these tools are installed on the desktop computers to monitor the data and emphasize trends and others capture information residing outside a data base. Majority of these programs are supported by windows and UNIX versions. However, some software specializes in one operating system There are large scopes for application of data mining in different types of area as follows: 1). In Medical Science: In medical science there is large scope for application of data mining. Diagnosis of diesis, health care, patient profiling and history generation etc. are the few examples. Mammography is the method used in breast cancer detection. Radiologists face lot of difficulties in detection of tumors that’s why CAM(Computer Aided Methods) could helps to the medical staff [21]. 2). In the Web Education: www.ijcsce.org 55 ISSN 2319-7080 International Journal of Computer Science and Communication Engineering Volume 5 issue 1(February 2016 issue) In the 21st century the beginners are using the data mining techniques which is one of the best learning method in this era. This makes it possible to increase the awareness of learners. Web Education which will rapidly growth in the application of data mining methods to educational chats which is both feasible and can be improvement in learning environments in the 21st century [22]. 3).A malicious Executable is Threat A malicious executable is threat to system’s security, it damage a system or obtaining sensitive information without the user’s permission. The data mining methods used to accurately detect malicious executables before they run[23]. 4). Sports data Mining : The data mining and its technique is used for an application of Sports center. Data mining is not only use in the business purposes but also it used in the sports .In the world, a huge number of games are available where each and every day the national and international games are to be scheduled, where a huge number of data’s are to be maintained [24]. IX. TRENDS IN DATA MINING Table 1: Data Mining Trends Comparative Statements [25] Data Algorithms/ Data Formats Computing Mining Techniques Resources Trends Employed Past Statistical, Numerical data Evolution of Machine and structured 4G PL and Learning data stored in various Techniques traditional related databases techniques Current Statistical, Heterogeneous High speed Machine data formats networks, Learning, includes High end Artificial structured, semi storage Intelligence, structured and devices and Pattern unstructured Parallel, Reorganization data Distributed Techniques computing etc… Future Soft Computing Complex data Multi-agent techniques like objects includes technologies Fuzzy logic, high and Cloud Neural dimensional, Computing Networks and high speed data Genetic streams, Programming sequence, noise in the time series, graph, Multi instance objects. IX. FUTURE WORK Today‘s competition is one of the most important challenges facing by all organizations and industries in data mining issues. As explained to address these issues, following problem should be widely studied [26]: a) Privacy and accuracy is a pair of contradiction; improving one usually incurs a cost in the other. How to apply various optimizations to achieve a trade-off should be deeply researched. b) In distributed privacy preserving data mining areas, efficiency is an essential issue. We should try to develop more efficient algorithms and achieve a balance between disclosure cost, computation cost and communication cost. c) Side-effects are unavoidable in data sanitization process. How to reduce their negative impact on privacy preserving needs to be considered carefully. We also need to define some metrics for measuring the side-effects resulted from data processing [27]. XI. CONCLUSION Data mining has become an important tool which can extract useful information from the huge amount of data we have nowadays. In this paper we reviewed the various data mining trends and applications from its inception to the future. This review puts focus on the hot and promising areas of data mining. It also may help to extract information from the Internet which becomes part of our life. The ability of automation the data mining techniques and the value added of using it, make it attractive to be used in different areas especially science and business areas with huge amount of data. Both in scientific and industrial world, the applications have become too widespread. Privacy protection deserves certainly a solid amount of attention, but it should not lead to an exaggerated apprehension of data mining. After all, the possibilities and opportunities of data mining are too valuable, for example in the development cycle of new medicines. These techniques are still subject of further research, but we expect that they will make rapidly the transition into a business environment. REFERENCES [1] Michael Goebel Et Al.”A Survey Of Data Mining And Knowledge Discovery Software Tools” Department Of Computer Science University Of Auckland, Sigkdd Explorations, Acm Sigkdd, June 1999. [2] Aparna S. Varde ,”Challenging Research Issues in Data Mining, Databases and Information Retrieval”, Department of Computer Science Montclair State University Montclair, NJ, USA. [3] Dr.A Bharati et al.”A Survey on Crime Data Analysis of Data Mining Using Clustering Techniques”, International Journal of Advance Research in Computer Science and Management Studies, Volume 2, Issue 8, August 2014, ISSN: 2327782. [4] Monika D. Khatri1.et al” History and Current and Future trends of Data mining Techniques” International Journal of www.ijcsce.org 56 ISSN 2319-7080 International Journal of Computer Science and Communication Engineering Volume 5 issue 1(February 2016 issue) Advance Research in Computer Science and Management Studies, Volume 2, Issue 3, March 2014, ISSN: 2321-7782. [5] Data mining: concepts and techniques second edition,Jiawei Hn,University of lions at Urbana Champaign,Micheline Kamber. [6] Anand V. Saurkar et al” A Review Paper on Various Data Mining Techniques ”International Journal of Advanced Research in Computer Science and Software Engineering Volume 4, Issue 4, April 2014 ISSN: 2277 128X . [7] Mafruz Zaman Ashrafi, David Taniar, Kate A. Smith, ”Data Mining Architecture for Clustered Environments” , Proceeding PARA '02 Proceedings of the 6th International Conference on Applied Parallel Computing Advanced Scientific Computing, Pages 89-98, Springer- Verlag London, UK ©2002. [8] Dileep Kumar Singh et al. “Data Security and Privacy in Data Mining: Research Issues & Preparation” International Journal of Computer Trends and Technologyvolume4Issue2- 2013 ISSN: 2231-2803. [9] Yujie Zheng ,”Clustering Methods in Data Mining with its Applications in High Education”2012 International Conference on Education Technology and Computer (ICETC2012) IPCSIT vol.43 (2012) © (2012) IACSIT Press, Singapore. [10] Riehard A. et al. “Wichern. Applied Multivariate Statistical Analysis (5 th Ed) 2003. [11] Guttman L. The quantification of a class of attributes: A theory and Method of scale construction[C].The Committee on Social Adjustment(ed.),The Prediction of Personal Adjustment. New York : Social Science Research Council , 1941. [12] Karimella Vikram and Niraj Upadhayaya, “Data Mining Tools and Techniques: a review,” Computer Engineering and Intelligent Systems, Vol 2, No.8, 2011, pp.31-39. [13] (2006) “Advantages & Disadvantages of Data Mining?” [online]. [14] Jiawei Han and Jing Gao, “Research Challenges for Data Mining in Science and Engineering”, Chapter 8, pp.1-8, [15] Kusiak, A., Kernstine, K.H., Kern, J.A., McLaughlin, K.A., and Tseng, T.L., “Data Mining: Medical And Engineering Case Studies”. Proceedings of the Industrial Engineering Research 2000 Conference, Cleveland, Ohio, pp. 1-7,May 21-23, 2000. [16] Romero, C., Ventura, S. and De-Bra, P. “Knowledge Discovery with Genetic Programming for Providing Feedback to Courseware Authors, Kluwer Academic Publishers, Printed in the Netherlands, 30/08/2004”. [17] Neelamadhab Padhyet al.” The Survey of Data Mining Applications And Feature Scope ”International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.3, June 2012 DOI : 10.5121/ijcseit.2012.2303 43 [18] Cai, W. and Li L., “Anomaly Detection using TCP Header Information, STAT753 Class Project Paper, May 2004.”. Nandi, T., Rao, C. B. and Ramchandran, S., “Comparative genomics using data mining tools, Journal of Bio-Science, Indian Academy of Sciences, Vol. 27,No. 1, Suppl. 1, page No. 15-25, February 2002”. [19] Robert P. Schumaker ,Osama K. Solieman ,Hsinchun Chen ,Springer. [20] Content Technology and its Applications Volume 4, Number 9, December 2010. [21] Anmol Kumar et al.” data Mining: Various Issues and Challenges for Future A Short discussion on Data Mining issues for future work” International Journal of Emerging Technology and Advanced Engineering, (ISSN 2250-2459 (Online), Volume 4, Special Issue 1, February 2014) International Conference on Advanced Developments in Engineering and Technology (ICADET-14), INDIA. [22] Sangeeta Goele, Nisha Chanana, “Data Mining Trend In Past, Current And Future,” International Journal of Computing & Business Research, in Proc. I-Society 2012, 2012. www.ijcsce.org 57