Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
The 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 26-29 August, 2012, Kadir Has University, Istanbul, Turkey On Finding Fine-Granularity User Communities by Profile Decomposition Seulki Lee, Minsam Ko, Keejun Han, Jae-Gil Lee Department of Knowledge Service Engineering KAIST(Korea Advanced Institute of Science and Technology) {seulki15, minsam.ko, brianhan87}@gmail.com, jaegil@kaist.ac.kr 2 Table of Contents Introduction DecompClus Algorithm Evaluation Related Work Conclusion 3 Community Discovery Community discovery is one of the most popular tasks in social network analysis. Many real-world applications with community discovery • Advertisement to common interest groups • Recommendation of potential collaborators in workplaces 4 Relationships in Social Networks A social network is modeled as a huge graph. • A node is a user. • An edge is a relationship between users. Two types of relationships in social network • Explicit relationship • Implicit relationship Explicit relationship Follower / Following Friend Implicit relationship Unknown, but similar interest We focus on this relationship. 5 Extracting implicit relationships To extract implicit relationships, a user is typically represented by his/her profile, and the similarity between user profiles is measured. The form of the profile depends on the social network and application. • In DBLP, the profile is a list of papers he/she wrote • In Twitter, the profile is a list of tweets he/she posted User A’s profile User B’s profile Similarity between the profiles = Implicit relationship … … 6 Limitation of a Single Profile Generally, a user is described by only a single profile which oversimplifies the multiple characteristics of a user. This problem results in loss of meaningful communities. Though User A and User B share the same interest about photography, overall similarity between the two users is not very high. 7 DecompClus We propose DecompClus, the community discovery method of profile decomposition, which divides a profile into sub-profiles. Step1: Profile Decomposition Profiles Step2: sub-profile clustering Sub-Profiles Communities outdoor, hiking, … … photo, lens, … outdoor, hiking, … outdoor, hiking, … photo, lens, … photo, lens, … photo, color, … art, museum, … photo, color, … photo, color, … … art, museum, art, museum, … … 8 Table of Contents • Introduction • DecompClus Algorithm • Evaluation • Related Work • Conclusion 9 Overall Procedure of DecompClus 10 Step 1: Profile Decomposition (1/2) A network of unit items (e.g., papers or tweets) is constructed for each user’s profile. • A node (item) is represented by a term vector (weight: TF-IDF). • An edge is determined as the similarity between two nodes (cosine similarity). User A’s profile i1 i5 i2 i6 i3 i4 i7 11 Step 1: Profile Decomposition (2/2) Clustering is performed on the small network. • We adopted a clustering algorithm based on modularity optimization, which tries to detect high modularity partitions of networks [V. D. Blondel, et. al., 2008]. Each cluster becomes a sub-profile. User A’s profile User A’s sub-profiles 12 Step 2: Sub-Profile Clustering (1/2) A network of sub-profiles is constructed by accumulating sub-profiles from every user. • A node (sub-profile) is represented by a term vector (weight: TF-IDF). • A edge is weighted by the similarity between two nodes (cosine similarity). User A’s sub-profile User B’s sub-profile User A’s sub-profile User D’s sub-profile User C’s sub-profile User E’s sub-profile 13 Step 2: Sub-Profile Clustering (2/2) Clustering is performed on the network of sub-profiles. • The same clustering method is used to group sub-profiles. Now, each cluster becomes a user community. User A’s User sub-profile A Usersub-profile B User B’s UserUser D’s sub-profile D User A’s User sub-profile A User E User C User C’s sub-profile Community C1 User E’s sub-profile Community C2 A user can belong to multiple communities (e.g., User A is in C1 and C2) • DecompClus is a method to discover overlapping community structure by non-overlapping clustering method. 14 Overall Procedure of DecompClus 15 Table of Contents • Introduction • DecompClus Algorithm • Evaluation • Related Work • Conclusion 16 Experimental Set-up (1/3) Evaluation methods • Quantitative evaluation: verify that DecompClus finds more tightly and well-connected communities Modularity value Intra-similarity Inter-similarity • Qualitative evaluation: explain how the communities by our method and those by compared method are different semantically Defining the theme of each community Case studies (See the paper) Visualization 17 Experimental Set-up (2/3) CiteULike • Social bookmarking service for scholarly papers • http://www.citeulike.org/faq/data.adp Distribution of users according to their tags Dataset • # of users = 122 • # of articles = 25,089 tag like 'data_mining%' or 'mining%' or 'knowledge_discovery%' • # of unique stemmed tags = 16,161 • Half of the users have more than one interest tag like 'social_network%' or 'socialnetwork%' tag like 'recommend%’ 18 Experimental Set-up (3/3) Implementation • Gephi Library - open-source software for visualizing and analyzing large network graphs Baseline • Follows almost the same procedures. • Use only one overall profile for a user Profiles photo, lens, … outdoor, hiking, … photo, color, … art, museum, … Communities … … photo, lens, … outdoor, hiking,… photo, color, … art, museum, … … … 19 Discovered Communities Community ID Bc1 Bc2 # OF USERS 57 65 Community ID DC1 DC2 DC3 DC4 # OF USERS 80 53 91 84 # of community • DecompClus finds more communities than Baseline does. # of users in community • The discovered communities by DecompClus have a greater number of members than Baseline. ∵ DecompClus allows a user to belong to multiple communities at the same time. 20 Quantitative Evaluation • DecompClus achieves better metrics than Baseline • Modularity value: the strength of division of a network into modules • Intra-similarity: the average value of similarities in a community • Inter-similarity: the average value of similarities between communities 0.08 0.03 0.0734 0.5 0.0279 0.4534 0.45 0.07 0.025 0.4 0.06 0.02 0.05 0.3604 0.35 0.3 0.04 Baseline 0.03 DecompClus 0.015 0.0133 Baseline DecompClus 0.01 0.25 Baseline DecompClus 0.2 0.15 0.02 0.01 0.1 0.005 0.05 0.0035 0 0 Modularity 0 Intra-similarity Inter-similarity In DecompClus the connections between the members within a community are denser; in contrast, the connections between the members in different communities are sparser. 21 Qualitative Evaluation (1/2) DecompClus preserves the themes defined by Baseline. DecompClus finds new communities that are not found by Baseline. ID THEME ID THEME BC1 Data mining & Recommendation DC1 BC2 Social Network DC2 Semantic Web DC3 Data mining & Bioinformatics DC4 Social Network Data mining & Recommendation newly founded 22 Qualitative Evaluation (2/2) In DecompClus , a user’s minor interests are not assimilated into his/her major interests, so new communities which consist of users’ minor interests can be discovered. Distribution of articles related to “Semantic web” Distribution of articles related to “Bioinformatics” 23 Visualization The community structure produced by DecompClus is more clearly distinguishable. By ForceAtlas2 layout provided by Gephi 24 Table of Contents • Introduction • DecompClus Algorithm • Evaluation • Related Work • Conclusion 25 Related Work (1/2) Comparison with related areas Approach # of profile per user In clustering, the type of mapping (Node: Community) Result Non-overlapping community discovery One profile 1:1 A user belongs to one community 1:N A user belongs to multiple communities 1:1 A user belongs to multiple communities Overlapping community discovery One profile DecompClus Multiple subprofiles 26 Related Work (2/2) Non-overlapping community discovery • Newman’s method [Newman and Girvan, 2004] • Multi-level graph partitioning method [Karypis and Kumar, 1995] • Attribute augmented graph [Zhou et al., 2006] • Bayesian generative models [Wang, 2006] Overlapping community discovery • CPM (clique percolation method) [Pallal et al., 2005] • Connectedness and local optimality [Goldberg et al., 2010] • Label propagation [Gregory, 2009] 27 Conclusion A novel concept of profile decomposition, which enables us to detect fine-granularity user communities with implicit relationships A new approach to discovering overlapping communities with non-overlapping community discovery algorithms We demonstrate, by using real data set, that our algorithm effectively discovers user communities from social media data. THANK YOU !! 29 Case Studies Case 1 • Users who become a member in multiple communities by profile decomposition For example, a user A’s profile Baseline User A Community Bc1(data mining& Recommendation) DecompClus User A’s sub-profile2 User A’s sub-profile1 user model, recommender, personalization, user profiling, knn, data mining … semantics, semantic web, rdf, ontology, social semantic web … Community Dc2 (semantic web) User A’s sub-profile3 Community Dc1 (data mining & recommendation) social network analysis, social search, graphs, … Community Bc2(Social network) Community Dc3 (Data mining & Bioinformatics) Community Dc4 (social network) In our data set, there are total 99 users (81.1%) like the user A. 30 Case Studies Case 2 • Users who become a member in the communities newly discovered by DecompClus For example, a user B’s profile Baseline DecompClus User B Community Bc1(data mining& Recommendation) Community Dc1 (data mining & recommendation) Community Dc2 (semantic web) User B’s sub-profile1 Community Bc2(Social network) statistics, cancer, genomics, gene, sequencing, virus, bacteria, database, classification, … Community Dc3 (Data mining & Bioinformatics) There are total 9 users (7.3%) like the user B. Community Dc4 (social network)