Social Network Analysis - PowerPoint PPT Presentation

Social Network Analysis. Social Network Introduction Statistics and Probability Theory Models of Social Network Generation Networks in Biological System Mining on Social Network Summary. Society. Nodes : individuals Links : social relationship (family/work/friendship/etc.).

Copyright Complaint Adult Content Flag as Inappropriate

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentation

Social Network Analysis

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

“Natural” Networks and Universality • Consider many kinds of networks: • social, technological, business, economic, content,… • These networks tend to share certain informal properties: • large scale; continual growth • distributed, organic growth: vertices “decide” who to link to • interaction restricted to links • mixture of local and long-distance connections • abstract notions of distance: geographical, content, social,… • Do natural networks share more quantitative universals? • What would these “universals” be? • How can we make them precise and measure them? • How can we explain their universality? • This is the domain of social network theory • Sometimes also referred to as link analysis Data Mining: Concepts and Techniques

Some Interesting Quantities • Connected components: • how many, and how large? • Networkdiameter: • maximum (worst-case) or average? • exclude infinite distances? (disconnected components) • the small-world phenomenon • Clustering: • to what extent that links tend to cluster “locally”? • what is the balance between local and long-distance connections? • what roles do the two types of links play? • Degreedistribution: • what is the typical degree in the network? • what is the overall distribution? Data Mining: Concepts and Techniques

A “Canonical” Natural Network has… • Fewconnected components: • often only 1 or a small number, indep. of network size • Small diameter: • often a constant independent of network size (like 6) • or perhaps growing only logarithmically with network size or even shrink? • typically exclude infinite distances • A high degree of clustering: • considerably more so than for a random network • in tension with small diameter • A heavy-tailed degree distribution: • a small but reliable number of high-degree vertices • often of power law form Data Mining: Concepts and Techniques

Zipf’s Law The same data plotted on linear and logarithmic scales. Both plots show a Zipf distribution with 300 datapoints Logarithmic scales on both axes Linear scales on both axes Data Mining: Concepts and Techniques

The Clustering Coefficient of a Network • Let nbr(u) denote the set of neighbors of u in a graph • all vertices v such that the edge (u,v) is in the graph • The clustering coefficient of u: • let k = |nbr(u)| (i.e., number of neighbors of u) • choose(k,2): max possible # of edges between vertices in nbr(u) • c(u) = (actual # of edges between vertices in nbr(u))/choose(k,2) • 0 <= c(u) <= 1; measure of cliquishness of u’s neighborhood • Clustering coefficient of a graph: • average of c(u) over all vertices u k = 4 choose(k,2) = 6 c(u) = 4/6 = 0.666… Data Mining: Concepts and Techniques

The Clustering Coefficient of a Network Clustering: My friends will likely know each other! Probability to be connected C»p # of links between 1,2,…n neighbors C = n(n-1)/2 Networks are clustered [large C(p)] but have a small characteristic path length [small L(p)]. Data Mining: Concepts and Techniques

Small Worlds and Occam’s Razor • For small a, should generate large clustering coefficients • we “programmed” the model to do so • Watts claims that proving precise statements is hard… • But we do notwant a new model for every little property • Erdos-Renyi  small diameter • a-model  high clustering coefficient • In the interests of Occam’s Razor, we would like to find • a single, simple model of network generation… • … that simultaneously captures many properties • Watt’s small world: small diameter and high clustering Data Mining: Concepts and Techniques

Scale-free Networks • The number of nodes (N) is not fixed • Networks continuously expand by additional new nodes • WWW: addition of new nodes • Citation: publication of new papers • The attachment is not uniform • A node is linked with higher probability to a node that already has a large number of links • WWW: new documents link to well known sites (CNN, Yahoo, Google) • Citation: Well cited papers are more likely to be cited again Data Mining: Concepts and Techniques

Scale-Free Networks • Start with (say) two vertices connected by an edge • For i = 3 to N: • for each 1 <= j < i, d(j) = degree of vertex j so far • let Z = S d(j) (sum of all degrees so far) • add new vertex i with k edges back to {1, …, i-1}: • i is connected back to j with probability d(j)/Z • Vertices j with high degree are likely to get more links! • “Rich get richer” • Natural model for many processes: • hyperlinks on the web • new business and social contacts • transportation networks • Generates a power law distribution of degrees • exponent depends on value of k Data Mining: Concepts and Techniques

Robustness of Random vs. Scale-Free Networks • The accidental failure of a number of nodes in a random network can fracture the system into non-communicating islands. • Scale-free networks are more robust in the face of such failures. • Scale-free networks are highly vulnerable to a coordinated attack against their hubs. Data Mining: Concepts and Techniques

Information on the Social Network • Heterogeneous, multi-relational data represented as a graph or network • Nodes are objects • May have different kinds of objects • Objects have attributes • Objects may have labels or classes • Edges are links • May have different kinds of links • Links may have attributes • Links may be directed, are not required to be binary • Links represent relationships and interactions between objects - rich content for mining Data Mining: Concepts and Techniques

What is New for Link Mining Here • Traditional machine learning and data mining approaches assume: • A random sample of homogeneous objects from single relation • Real world data sets: • Multi-relational, heterogeneous and semi-structured • Link Mining • Newly emerging research area at the intersection of research in social network and link analysis, hypertext and web mining, graph mining, relational learning and inductive logic programming Data Mining: Concepts and Techniques

Link-Based Object Ranking (LBR) • LBR: Exploit the link structure of a graph to order or prioritize the set of objects within the graph • Focused on graphs with single object type and single link type • This is a primary focus of link analysis community • Web information analysis • PageRank and Hits are typical LBR approaches • In social network analysis (SNA), LBR is a core analysis task • Objective: rank individuals in terms of “centrality” • Degree centrality vs. eigen vector/power centrality • Rank objects relative to one or more relevant objects in the graph vs. ranks object over time in dynamic graphs Data Mining: Concepts and Techniques

PageRank: Capturing Page Popularity(Brin & Page’98) • Intuitions • Links are like citations in literature • A page that is cited often can be expected to be more useful in general • PageRank is essentially “citation counting”, but improves over simple counting • Consider “indirect citations” (being cited by a highly cited paper counts a lot…) • Smoothing of citations (every page is assumed to have a non-zero citation count) • PageRank can also be interpreted as random surfing (thus capturing popularity) Data Mining: Concepts and Techniques

HITS: Capturing Authorities & Hubs (Kleinberg’98) • Intuitions • Pages that are widely cited are good authorities • Pages that cite many other pages are good hubs • The key idea of HITS • Good authorities are cited by good hubs • Good hubs point to good authorities • Iterative reinforcement … Data Mining: Concepts and Techniques

Block-level Link Analysis (Cai et al. 04) • Most of the existing link analysis algorithms, e.g. PageRank and HITS, treat a web page as a single node in the web graph • However, in most cases, a web page contains multiple semantics and hence it might not be considered as an atomic and homogeneous node • Web page is partitioned into blocks using the vision-based page segmentation algorithm • extract page-to-block, block-to-page relationships • Block-level PageRank and Block-level HITS Data Mining: Concepts and Techniques

Link-Based Object Classification (LBC) • Predicting the category of an object based on its attributes, its links and the attributes of linked objects • Web: Predict the category of a web page, based on words that occur on the page, links between pages, anchor text, html tags, etc. • Citation: Predict the topic of a paper, based on word occurrence, citations, co-citations • Epidemics: Predict disease type based on characteristics of the patients infected by the disease • Communication: Predict whether a communication contact is by email, phone call or mail Data Mining: Concepts and Techniques

Entity Resolution • Predicting when two objects are the same, based on their attributes and their links • Also known as: deduplication, reference reconciliation, co-reference resolution, object consolidation • Applications • Web: predict when two sites are mirrors of each other • Citation: predicting when two citations are referring to the same paper • Epidemics: predicting when two disease strains are the same • Biology: learning when two names refer to the same protein Data Mining: Concepts and Techniques

Entity Resolution Methods • Earlier viewed as pair-wise resolution problem: resolved based on the similarity of their attributes • Importance at considering links • Coauthor links in bib data, hierarchical links between spatial references, co-occurrence links between name references in documents • Use of links in resolution • Collective entity resolution: one resolution decision affects another if they are linked • Propagating evidence over links in a depen. graph • Probabilistic models interact with different entity recognition decisions Data Mining: Concepts and Techniques

Link Prediction • Predict whether a link exists between two entities, based on attributes and other observed links • Applications • Web: predict if there will be a link between two pages • Citation: predicting if a paper will cite another paper • Epidemics: predicting who a patient’s contacts are • Methods • Often viewed as a binary classification problem • Local conditional probability model, based on structural and attribute features • Difficulty: sparseness of existing links • Collective prediction, e.g., Markov random field model Data Mining: Concepts and Techniques

Link Cardinality Estimation • Predicting the number of links to an object • Web: predict the authority of a page based on the number of in-links; identifying hubs based on the number of out-links • Citation: predicting the impact of a paper based on the number of citations • Epidemics: predicting the number of people that will be infected based on the infectiousness of a disease • Predicting the number of objects reached along a path from an object • Web: predicting number of pages retrieved by crawling a site • Citation: predicting the number of citations of a particular author in a specific journal Data Mining: Concepts and Techniques