Download Print this article - Informatics Journals

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cluster analysis wikipedia , lookup

Transcript
ISSN 2454-3349
Int. Journal of Philosophies in Computer Science
Vol. 1, No. 1, (2015), pp. 21-30
AN APPLICATION OF PREPROCESSING AND
CLUSTERING IN WEB LOG MINING
S. Uma Maheswari1 and S. K. Srivatsa2
1
SCSVMV University, Kanchipuram, Tamilnadu, India
E-mail: umarunn@gmail.com
2
St. Joseph College of Engineering, Chennai, Tamilnadu, India
E-mail: profsks@rediffmail.com
ABSTRACT
Identifying the user behavior is one of the aspect of web usage mining and hence web logs play an major role to
identify the user behavior. There are many pattern mining methods to identify user behavior. The accuracy &
quality of pattern mining algorithms are improved with the help of preprocessing techniques. Various activities
like identifying the number of unique users, reducing the size of log file, identifying the sessions are done with
the help of preprocessing techniques in the existing algorithms. The newly proposed algorithm is Enhanced
User Behavior (EUB) which identifies user behavior and groups the similar kind of users using clustering
techniques. This paper brings into discussion about the basic concepts of web log preprocessing, various
existing clustering preprocessing techniques and the proposed EUB algorithm.
KEYWORDS
Preprocessing techniques, web mining, web logs, clustering.
1.
INTRODUCTION
Web seems to be too huge for effective data warehousing and data mining. World Wide Web serves
as a huge, widely distributed, global information service center for news, advertisements, consumer
information, financial management, education, e-commerce etc. Information is arranged in proper
hierarchy in the form of websites. The web also contains a rich and dynamic collection of hyperlink
information. Collection of web pages named websites are accessed via hyperlinks. Nowadays internet
plays a vital role for providing information to all kinds of users to obtain their needs. Day - by- day
the usage and accessibility of internet is increasing tremendously. Websites are highly helpful for
providing any kind of information to any kind of users at any time. A web server usually registers a
weblog entry for every access of webpage. Whenever the user interacts with the web site, the
interaction details are automatically recorded in web server in the form of web logs [2]. The web
seems to be two huge for effective data warehousing and mining. Also the complexity of web pages is
far greater than that of any old text documents. Only a small portion of the information on the web is
truly relevant [13]. It is possible to get lots of data on user access patterns and also possible to mine
interesting nuggets of information. It is one of the main applications of data mining, artificial
intelligence and so on to the web data and forecast the user's visiting behaviors and obtains their
interests by investigating the samples [4].Web usage mining involves the analysis and discovery of
user access patterns from Web servers logs in order to better serve the user's needs. In web usage
mining or web log mining, user’s behavior or interests are revealed by applying data mining
techniques on web log file.
The ability to know the patterns of user’s habits and interests helps the operational strategies of
enterprises. It is highly important for the website analyst to know and understand about the user level
Maheswari and Srivatsa
of interest and behavior for variety of reasons. In web usage mining web logs plays an important role
to know about user behavior. In paper [3] the algorithms proposed have done the preprocessing
activities for reducing the size of the log file and to identify the number of unique users and sessions.
According to paper [1] intelligent system web usage preprocessor categorize human and search
engine accesses before applying the preprocessing techniques.
Web usage mining is the third category in web mining. This type of web mining allows for the
collection of web access information for web pages. The usage data that is gathered provides the
companies with the ability to produce results more effective to their businesses and increasing of
sales. Usage data can also be useful for developing marketing skills that will out-sell the competitors
and promote the company’s services or product on a higher level. Usage mining is valuable not only
to businesses using online marketing, but also to e-businesses whose business is based solely on the
traffic provided through search engines. The use of this type of web mining helps to gather the
important information from customers visiting the site. This enables an in-depth log to complete
analysis of a company’s productivity flow.
E-businesses depend on this information to direct the company to the most effective web server for
promotion of their product or service. Web usage mining is one of the categories of data mining
technique that identifies usage patterns of the web data [5]. The projection of these paths helps to log
the user registration information giving commonly used paths the forefront to its access. Therefore, it
is easily determined that usage mining has valuable uses to the marketing of businesses and a direct
impact to the success of their promotional strategies and internet traffic. This information is gathered
on a daily basis and continues to be analyzed consistently. Analysis of this pertinent information will
help companies to develop promotions that are more effective, internet accessibility, inter-company
communication and structure, and productive marketing skills through web usage mining.
The intelligent system web usage preprocessor splits the human and search engine accesses before
using the preprocessing techniques. This can be extended by using some other learning algorithms
also [1]. It can be utilized to user profiling and similar image retrieval by tracing the visitor’s on line
behaviors for effective web usage mining [11]. Many preprocessing techniques can be effectively
applied in web log mining [7]. The preprocessing of web log data for finding frequent patterns using
weighted association rule mining technique can be done to other industrial and social organizations
too [6]. In recent days, the web usage mining has great potential and frequently employed for the
tasks like web personalization, web pages prefetching and website reorganization etc. [12]. So, it is
required to know the users behavior when interaction is made with the web.
Yang and Padmanabhan focused on grouping the customer transactions by using the clustering
technique [14]. The set of transactions in a group has some similarities, so we can easily identified the
customer behaviour and the web site analyst can able to understand the customer expectation and
make the website customer friendly. Mahdaviand and Abolhassani dealt with two types of groups
such as web clustering groups which groups the relative pages from the web server log files, and the
user clustering groups which groups the user who refers the same type of web pages [15]. Guha,
Rastogi and Shim mainly focused on the data preprocessing step to remove the unnecessary data such
as images, extra click events [16]. Das and Vyas [17] have presented a model for web personalization
approach using web mining. The server side and browser side details are taken for consideration.
Suneetha and Krishnamoorti [18] have developed an intelligent recommendation system to determine
pages that are most likely to be visited by the user in future. This assists site owners in optimization,
improving user satisfaction etc.
2.
ON LINE BEHAVIOUR AND TRACING OF VISITORS IN WEB MINING
It must to trace the visitor’s on-line behaviors for website usage analysis. Actually it is an analysis to
get knowledge about how visitors use website which could provide guidelines to website
reorganization and helps to prevent disorientation. It also helps designers in placing the important
information appropriately so that a visitor will look for it. It has to be done for pre-fetching and
caching web pages. Also it must provide adaptive website (personalization). This is represented in the
following Figure 1.
22
Preprocessing and Clustering in Web Log Mining
Website
Reorganizatio
Prefetching web
pages
Figure 1
Personalization
Website usage analysis
Many organizations have been supported by the analysis of user’s browsing patterns for the purpose
of giving personalized recommendations of web pages. Generally the usage based personalized
recommendation gives solutions to many problems occurred in the web [13, 14, 15]. It has created an
interest between the researchers to do research. The recommendation systems lessen information
overload by suggesting pages that fulfills the user’s requirement. In recent days, the web usage
mining has great potential and frequently employed for the tasks like web personalization, web pages
prefetching and website reorganization etc. [16]. Data used in web usage mining are obtained in three
levels such as at server level, client level and proxy level [12]. In server level, the server keeps the
client request details whereas at client level the client itself forwards data about user’s behavior to a
database. At proxy level, the proxy side maintains user behavior information of the users whose web
clients pass through the proxy even though the web data is taken from many users on various web
sites. Generally the web pages, intra page structures, inter page structures and usage data are the input
used in web usage mining. Other forms of web data reside as profiles, registration information and
cookies. Web usage data is collective data about how a user utilizes a web site through his mouse and
keyboard. This data can also be available in form of web server logs, referral logs, registration files
and index server logs and cookies.
The aim of web log file is to create user profile by allowing their browsing similarities with previous
users. Before the data mining process, it is required to clean, condense and transform the raw data of
web log before performing data mining. Web log data can be integrated with web content and web
linkage structure mining to help web page ranking and web document classification. The interaction
details of user with website are recorded automatically in web servers as the form of web logs [2].
Web logs are kept as in form of line of text in web server, proxy server and browser [8]. Other forms
of logs are server access logs, server referrer logs, agent logs, client-side cookies, user profiles, search
engine logs and database logs. These are considered as input for knowing the end user behaviour in
web usage mining. Log files are those files that list the actions that have been occurred [19]. It holds
many parameters which have employed in recognizing user browsing patterns. Some of the
parameters are user name, visiting path, path traversed time stamp, success rate, user agent, URL,
request type, and pages last visited [18]. The information on user’s request from their web browsers is
stored in transfer / access log as shown in Table 1. The recorded two fields of referrer log are URL
and referrer URL as shown in Table 2.
Table 1
Time
Date
Host name
File
requested
Transfer or access log
Amount
of data Transferred
Table 2
URL
Status of report
Referrer log
Referrer URL
The list of errors and requests which have failed are collected in error log. Not only for the page
which holds links to a file that does not exist, but also for the user who is not permitted to access a
particular page, the user request may fail. It is depicted in the following Figure 2.
23
Maheswari and Srivatsa
Request
SERVER
CLIENT
Reply
Figure 2
3.
Error log
BASICS OF P REPROCESSING AND C LUSTERING TECHNIQUES
The data preprocessing is done for the purpose of descriptive data summarization Data cleaning, data
integration and transformation, data reduction, discretization and concept hierarchy generation. Data
in the real world is dirty like incomplete, noisy, and inconsistent. Now-a-days there is no quality data
and no quality mining results are available. So the quality decisions must be based on quality data.
For example, duplicate or missing data may cause incorrect or even misleading statistics. Also the
data warehouse needs consistent integration of quality data. Moreover the data extraction, cleaning,
and transformation comprise the majority of the work of building a data warehouse. That is why it is
important to preprocess the data. The inputs given to web usage mining are list of agents visiting the
websites, duration of session, order in which the agents view the web pages. It is difficult to use the
web logs directly for pattern mining algorithms to extract the features. So the preprocessing
techniques are necessary to make them consistent and complete.
3.1
Basic Structure of Web Log Mining
Preprocessing a web log mining model aims to reformat the original web logs to identify user’s access
sessions. Access log is the input to preprocessing block. A web log is a file to which the web server
writes information each time a user requests a resource from that particular site. The basic structure of
web log mining is depicted in the following Figure 3.
Access Log
Data Cleaning
User Identification
Session Identification
Path Completion
Session File
Figure 3
Basic structure of web log mining
Web logs are kept as in form of line of text [8]. They are such as web server, proxy server, and
browser. It holds all the log files and gives more complete and accurate user’s interaction information
along with the website. The World Wide Web consortium retains a standard format to use for the web
server log files. It stores client IP address, request date, request time, page requested, hyper text
transfer protocol code, bytes transferred, user agent and referrer. The proxy server behaves like a
mediator between the browser and the web server. It gets hyper text transfer protocol request from
24
Preprocessing and Clustering in Web Log Mining
user and it passes them to the web server. Web logs for the particular user are maintained in the
browser machine. Browsers are programmed and scripting languages are employed in it to collect the
client side data. In general, three types of web log formats [8] are available. The formats are namely
World Wide Web consortium extended log file format. It is a default log file format on the Internet
information server. Here the fields are shown along with the space. The time is represented as the
Greenwich Mean Time. This format can be altered by the administrators to add or remove fields
based on the information required. National center for supercomputing application (NCSA) common
log file format stores user name, date, time, request type, hyper text transfer protocol status code, and
the number of bytes. NCSA has a fixed format and it is not changed to remove or add by
administrators. Microsoft internet information server (IIS) log file are also employed in American
Standard Code for Information Interchange format and it is not customized. Here fields are
represented by using comma.
3.2
Preprocessing Steps
Preprocessing [9] is an important activity in web usage mining and treated as a key to success. These
techniques eliminate the unwanted information from the web logs and facilitate the effective pattern
mining. The steps involved in data preprocessing consists of data cleaning, user and session
identification, and path completion. These are explained below.
3.2.1
Data Cleaning
Data collection [6] is the initial step in web log preprocessing. Irrelevant records are eliminated
during data cleansing. Data cleaning [10] is the process of removing noisy and irrelevant data that are
not helpful for mining the knowledge from the web logs.
3.2.2
User and Session Identification
The task of user and session identification is to find out the different user sessions from the original
web access log. Session identification [7] is the process of dividing the individual user access logs
into sessions. In general, a referrer based method is used for identifying sessions.
3.2.3
Path Completion
Path completion should be used acquiring the complete user access path. The incomplete access path
of every user session is recognized based on user session identification. If in a start of user session,
referrer as well URL has data value, delete value of referrer by adding a dash ‘-’. Web log
preprocessing helps in removal of unwanted click streams from the log file and also reduces the size
of original file by 40 – 50%.
3.3
Clustering Techniques
Clustering of data in a large dimension space is of great interest in many data mining applications. It
is a technique to group together a set of items having similar characteristics. In addition, it can be
performed on either the users or the page views. Clustering analysis in web usage mining intends to
find the cluster of user, page, or sessions from web log file, where each cluster represents a group of
objects with common interesting or characteristic. User clustering is designed to find user groups that
have common interests based on their behaviors, and it is critical for user community construction.
Page clustering is the process of clustering pages according to the user’s access over them. Such
knowledge is especially useful for inferring user demographics in order to perform market
segmentation in e-Commerce applications or provide personalized web content to the users. On the
other hand, clustering of pages will discover groups of pages having related content. This information
is useful for the Internet search engines and Web assistance providers. In both applications,
permanent or dynamic HTML pages can be created that suggest related hyperlinks to the user
according to the user’s query or past history of information needs. The intuition is that if the
probability of visiting page, given page has also been visited, is high, then maybe they can be grouped
into one cluster.
Several clustering algorithms can be applied for clustering in large multimedia databases. The
effectiveness and efficiency of the existing algorithms, however, are somewhat limited. It is because,
25
Maheswari and Srivatsa
clustering in multimedia databases requires clustering of high dimensional feature vectors and it often
contains large amounts of noise. In this paper, we therefore introduce a new kernel density estimation
based algorithm for clustering in large multimedia databases called DENCLUE (Density based
clustering).
Clusters can then be identified by determining density attractors and clusters of arbitrary shape can be
easily described by a simple equation of the overall density function. The advantages of our kernel
density estimation based DENCLUE approach are: (1) it has a firm mathematical basis; (2) it has
good clustering properties in data sets with large amounts of noise; (3) it allows a compact
mathematical description of arbitrarily shaped clusters in high-dimensional data sets; and (4) it is
significantly faster than existing algorithms. A comparison with k-Means, DBSCAN, and BIRCH
shows the superiority of our proposed approach.
4.
EXISTING PREPROCESSING ALGORITHM
Now-a-days the usage of internet services has been increased tremendously. Internet surfing or
website surfing is highly necessary in order to make surfing practice to develop relationship between
the users. Hence this activity triggers the users to do analysis by gathering information and capture the
user level of interest from the web log files. The measures used to retrieve the user behavior are the
web logs. Mining the user behavior from the web log is called web log mining. It mines the user’s
level of interest by their frequent visit to the websites. The web logs are updated every time whenever
the user visits a particular web site.
In user interest level preprocessing (UILP) algorithm, website and webpage navigation behavior are
considered as the basic source for identifying the interest of the user. In UILP, data cleaning method
is used to remove the noisy and irrelevant information from the web log. This is one of the features in
identifying the user level of interest. The second feature used is based on site topology and cookies.
Frequency value, session identification, path completion are also identified using this UILP algorithm
[11].User’s interest level is identified mainly based on their website and webpage navigation
behavior. The proposed UILP algorithm considered the following four features to identify the user
interest level. These are listed below:
5.
o
During data cleaning process, explicit image and multimedia requests from users are
considered; those requests are not removed from web logs.
o
Users are identified based on site topology and cookies.
o
Session time is calculated based on the time spent on each website by a particular user.
o
Frequency value is calculated based on the number of web pages visited by the user on
particular website.
PROPOSED METHODOLOGY
In the existing system user interest level is identified using data cleaning technique. From the detailed
analysis of existing preprocessing techniques it is well understood that not only identifying the user
level of interest is the major criteria to be focused, but also grouping the similar kind of users could be
done using any one of the clustering technique. This paper highly concentrates on how similar kind of
user behavior can be identified and grouped using clustering technology. Clustering is the process of
grouping the data into classes or clusters, so that objects in the same cluster have higher similarity in
comparison to one another but are very dissimilar to the objects in other clusters.
Dissimilarities are assessed based on attribute values describing the objects. There are various
methods like partitioning methods, hierarchical methods, density based methods, grid based methods,
for high dimensional data. The proposed system makes use of partition methods to group the similar
kind of users. The most and well known commonly used partition methods are k-means to identify
and group the similar kind of users. Frequent users are grouped based on clustering techniques. Web
logs are processed using enhanced user behavior algorithm based on how frequent the user visits the
websites. Web logs contain the following fields which serve as the input for reducing the density of
log contents in a log file. A sample web log table is presented in the Table 3, where IP_addr is the IP
26
Preprocessing and Clustering in Web Log Mining
address; End_usr is the name of the user; URL is the website address; Session is the duration taken by
the user over session; and Frequency is the umber of times the user visits the website.
Table 3
IP_addr
5.1
End_usr
Fields in a web log
URL
Session
Frequency
Procedure for K-means Algorithm
The following procedure is generally used in K-means algorithm.
1. Arbitrarily choose k objects from the web log.
2. Find the number of users frequently visiting the websites by calculating mean value.
3. Reassign each object to the cluster to which the object is most similar based on mean value.
4. Update the cluster means often.
5. Similarly group dissimilar the objects in another cluster.
5.2
Enhanced user Behaviour Procedure
In the proposed enhanced user behaviour (EUB) procedure, clustering plays a key role to classify web
visitors on the basis of user click history and similarity measure. This algorithm considers of four
entities namely IP address, user name, website name, and frequency of accessed sites. Cookies based
web logs are taken as the input which mainly classify the unique users and helps to create user
clusters. Here, the website and webpage navigation behavior are considered as the basic source for
tracing the visitor’s online behaviour and also to identify the interest of the user in accessing the
various web sites. Hence the EUB algorithm effectively traces the behavior of online users which
supports the website usage analysis. The following procedure is used in the EUB procedure.
1. Choose the input data from the web log.
2. Using the frequency field as the basic constraint, calculate the mean value.
3. Depending on the mean value clusters are formed.
4. Reassign each object to the cluster to which the object is most similar based on the mean
value.
5. Repeat the steps until objects are grouped appropriately.
6. Similarly group dissimilar objects in another cluster.
6.
EXPERIMENTAL SETUP AND C OMPARATIVE R ESULTS
The web log files are gathered from the college web server in the period of 18th October 2012 to 21st
August 2013. The total number of records were collected are 27962 in count. The preprocessing
algorithm is implemented by using jdk 1.6. It helps and supports the execution. EUB algorithm is
evaluated along with the current methods in terms of data cleaning, user identification, and session
identification.
6.1
Statement of Evaluation
Figure 4 shows evaluation data for the data cleaning. The UILP algorithm takes consideration of
explicit user request for image and multimedia files. It eliminates the 5084 image files and 3813
multimedia files. The ISWUP is an intelligent learning algorithm, which removes 5950 image files
and 4784 video files. Also Sudheer Reddy algorithm removes 6330 image files and 5140 video files.
But the EUB algorithm removes 5010 image files and 3421 video files. Removal of auto search
27
Maheswari and Srivatsa
engine request and failure status code is more or less same for all the cases. The EUB algorithm deals
both explicit and implicit user requests.
Figure 4
Data cleaning overall comparison
EUB algorithm proves its efficiency and effectiveness for finding the number of users and unique
users by using clustering concept. Clustering is the process of organizing objects into groups whose
members are similar in some way. A cluster is therefore a collection of objects which are similar
between them and are dissimilar to the objects belonging to other clusters. The proposed algorithm
identifies 1310 users and 615 unique users by implementing the clustering concept. But the UILP
algorithm identifies 1271 users and 550 unique users. The ISWUP identifies 1190 users and 457
unique users. The other existing algorithms identify 1190 users and 428 unique users. This is depicted
in the following Figure 5.
Figure 5
User identification performance comparisons
From Figure 6, it is known that the number of sessions identified by existing and proposed
algorithms. The two measures used are Session identification and frequency calculation. Session is
calculated based on the movement from one website to another website and frequency value is
calculated with respect to the number of pages visited by the user in the particular website.
The session and frequency is considered as an important factor for clustering the users based on their
interesting measure. EUB Algorithm finds 65200 sessions and the UILP identifies 61783. Also the
number of sessions identified by ISWUP and Sudheer are 4760 and 3570 respectively. Hence, by
overall the proposed EUB algorithm significantly maximizes the performance of identifying unique
users and number of sessions.
28
Preprocessing and Clustering in Web Log Mining
70000
60000
50000
40000
30000
20000
10000
0
EUB
Figure 6
7.
UILP
ISWUP
K.Sudheer
Session identification performance comparison
CONCLUSIONS AND FUTURE ENHANCEMENT
Enhanced user behaviour algorithm is formulated to do data cleaning, user identification, session
identification, path completion and also tells the unique users by using the cluster concept. There are
many techniques and suggestions are discussed for pre processing the data taken from the web log
files. The proposed EUB algorithm effectively preprocesses the web log data and identifies user
behavior, and groups the similar kind of users using clustering techniques. It must to trace the
visitor’s online behaviors for website usage analysis. A number of further tasks could be added by
demonstrating the utility of web mining. It can be done by making exploratory changes to web sites.
REFERENCES
[1]
V. V. R. Maheswara Rao, and V. Valli Kumari, “An Enhanced Pre-Processing Research Framework
for Web Log Data using a Learning Algorithm”, Proceedings of the Second International Conference
on Networks & Communications (Netcom 2010), Bangalore, India, pp. 1–15, (2011).
[2]
Sanjay Bapu Thakare, and Sangram Z. Gawali, “A Effective and Complete Preprocessing for Web
Usage Mining”, International Journal on Computer Science and Engineering, Vol. 2(3), pp. 848-851,
(2010).
[3]
T. Hussain, S. Asghar, and S. Fong, “A Hierarchical Cluster Based Preprocessing Methodology for
Web Usage Mining”, Proceedings of the Sixth International Conference on Advanced Information
Management and Service, IEEE Xplore, pp. 472-477, (2010).
[4]
P. Nithya, and P. Sumathi, “An Enhanced Pre-Processing Technique for Web Log Mining by
Removing Web Robots”, Proceedings of the IEEE International Conference on Computational
Intelligence and Computing Research, IEEE Xplore, pp. 1-4, (2012).
[5]
K. Sudheer Reddy, M. Kantha Reddy, and V. Sitaramalu, “An effective Data Preprocessing method for
Web Usage Mining”, Proceedings of the International Conference on Information Communications
and Embedded Systems, IEEE Xplore, pp. 7-10, (2013).
[6]
M. Malarvizhi, S. A. Sahaaya, and Arul Mary, “Preprocessing of Educational Institution Web Log
Data for Finding Frequent Patterns using Weighted Association Rule Mining Technique”, European
Journal of Scientific Research, Vol. 74(4), pp. 617-633, (2012).
[7]
Sheetal A. Raiyani, and Shailendra Jain, “Efficient Preprocessing Technique using Web Log Mining”,
International Journal of Advancements in Research & Technology, Vol. 1(6), pp. 1-5, (2012).
[8]
J. Srivatsava, R. Cooley, M. Deshpande, and P. N. Tan, “Web Usage Mining: Discovery and
Applications of Usage Patterns from Web Data”, SIGKDD Explorations, Vol. 1(2), pp. 12-23, (2000).
29
Maheswari and Srivatsa
[9]
V. Chitraa, and Antony Selvadoss Devamani, “A Novel Technique for Sessions Identification in Web
Usage Mining Preprocessing”, International Journal of Computer Applications, Vol. 34(9), pp. 23-27,
(2011)
[10]
Vijayashri Losarwar, and Madhuri Joshi, “Data Preprocessing in Web Usage Mining”, Proceedings of
the International Conference on Artificial Intelligence and Embedded Systems (ICAIES 2012),
Singapore, (2012).
[11]
R. Suguna, and D. Sharmila, “User Interest Level based Preprocessing Algorithms using Web usage
Mining”, International Journal on Computer Science and Engineering, Vol. 5(9), pp. 815-822, (2013).
[12]
C. P. Sumathi, R. Padmaja Valli, and T. Santhanam, “Automatic Recommendation of Web Pages in
Web Usage Mining”, International Journal on Computer Science and Engineering, Vol. 2(9), pp. 30463052, (2010).
[13]
Jiawai Han, and Micheline Kamber, “Data Mining - Concepts and Techniques”, Second Edition,
Elsevier, USA (2010).
[14]
Yinghui Yang, and B. Padmanabhan, “GHIC: A Hierarchical Pattern-Based Clustering Algorithm for
Grouping Web Transactions”, IEEE Transactions on Knowledge and Data Engineering, Vol. 17(9), pp.
1300-1304, (2005).
[15]
M. Mahdavi, and H. Abolhassani, “Harmony K - means Algorithm for Document Clustering”, Data
Mining and Knowledge Discovery, Vol. 18(3), pp. 370-391, (2009).
[16]
Sudipto Guha, Rajeev Rastogi, and Kyuseok Shim, “CURE: An Efficient Clustering Algorithm for
Large Database”, ACM SIGMOD, Vol. 27(2), pp. 73-84, (1998).
[17]
K. Sharma, G. Shrivastava, and V. Kumar, “Web Mining: Today and Tomorrow”, Proceedings of the
Third International Conference on Electronics Computer Technology, IEEE Xplore, Vol. 1, pp. 399 –
403, (2011).
[18]
K. R. Suneetha, and R. Krishnamoorti, “IRS: Intelligent Recommendation System for Web
Personalization”, European Journal of Scientific Research, Vol. 65(2), pp. 175-186, (2011).
Authors Biography
S. Umamaheswari is working as Associate Professor and Head of the department of IT in C.
Abdul Hakeem College of engineering & Technology. She did B.E. (CSE) in V. R. S. College
of Engineering & Technology and M. Tech. (IT) in Sathyabama University. She is a
distinction holder in both UG and PG. She is doing research in data mining at SCSVMV
University, Kancheepuram. She is having 11 years of experience in teaching. She has
published 8 papers in International Journals and conferences. Her area of interest is data
mining, artificial intelligence and neural networks and guided more number of UG and PG
projects.
S. K. Srivatsa is retired Senior Professor in Anna University who currently working as Senior
Professor in Prathyusha Institute of Technology and Management, Chennai. He received his
bachelor of Electronics and Telecommunication (Honor) from Jadavpur University and
master degree in Electronics & Communication Engineering from Indian Institute of science
and Ph.D. also from Indian Institute of Science. He has produced 25 Ph. D’s. He is author of
440 publications in reputed journal and conference proceedings. He is a life member and
fellow in twenty four registered professional societies.
30