Transcription

1 Ranking in Networks Question: Given a communication network N, how to discover important nodes? How to define the importance of members of the network? The answer may help in discovery of most influential member(s) of a social network; key infrastructure nodes; in an urban network; super-spreaders of disease;... Polling the members is not efficient and is not accurate. Examples of networks: The Web; The protein network; The network of scientific cooperation;... The answer is implemented via the notion of centrality which gives a real-valued function on the nodes of a graph. The values of the function provide a ranking which identifies the most important nodes. On the other hand, it is often meaningless for not most important nodes. 1

2 The word importance has a wide number of meanings, leading to many different definitions of centrality. What are the network features that characterize the importance of a node in a network? Degree centrality The degree can be interpreted as the chances of a node to catch whatever is flowing through the network (such as a virus, or some information). In the case of a directed network (where ties have direction), two separate measures of degree centrality, are defined: in-degree indeg(v), and out-degree, outdeg(v). in-degree is interpreted as a measure of popularity; out-degree is interpreted as a measure of social involvement. Graph Centralization. Let G be a connected graph; let X V(G), where G[X] is also conneted. Denote (X) the highest degree centrality in X. Define H = max { ( (X)) deg(x))}. X x X The degree centralization of the graph G as follows: C(G) = v V(G) [ (V) deg(v)] H ThevalueofHismaximizedwhenthegraphX containsonecentral node to which all other nodes are connected (a star graph), and in this case H = (n 1)(n 2). 2

3 Closeness centrality In connected graphs, dist(x, y) denotes the length of a shortest path from x to y. The farness of x is defined as farness(x) = y x dist(x,y). The closeness of x is defined as 1 cl(x) = farness(x). If G is disconnected and vertices x and y belong to different connectdd components, dist(x, y) =. When a graph is not strongly connected, and no path connects y with x, then we assume dist(y,x) =, and use the sum of reciprocal of distances, instead of the reciprocal of the sum of distances, with the convention 1/ = 0: H(x) = y x 1 dist(y,x). For undirected graphs, the notion is known as harmonic centrality. A variation of the notion is defined as D(x) = 1 2 dist(y,x). y x 3

4 Betweenness centrality. Betweenness is a centrality measure of a vertex within a graph. Betweenness centrality quantifies the number of times a node acts as a bridge along the shortest path between two other nodes. It was introduced as a measure for quantifying the control of a human on the communication between other humans in a social network. Vertices that have a high probability to occur on a randomly chosen shortest path between two randomly chosen vertices have a high betweenness. The betweenness of a vertex v in a graph G = (V,E) is computed as follows: 1. For each pair of vertices (x,y), compute the shortest paths between them. 2. For each pair of vertices (x,y), determine the fraction of shortest paths that pass through the vertex v. 3. Sum this fraction over all pairs of vertices (x,y). CB(v) = x v y V σ xy (v) σ xy where σ xy is the total number of shortest paths from node x to node y and σ xy (v) is the number of those paths that pass through v. The betweenness may be normalised by dividing through the number of pairs of vertices not including v which for directed graphs is (n 1)(n 2) and for undirected graphs is (n 1)(n 2)/2. 4

5 Computational complexity Both betweenness and closeness centralities of all vertices in a graph involve calculating the shortest paths between all pairs of vertices on a graph, which requires Θ(V 3 ) time with the Floyd- Warshall algorithm. However, on sparse graphs, Johnson s algorithm may be more efficient, taking O(V 2 logv +VE) time. In the case of unweighted graphs the calculations can be done with Brandes algorithm[19] which takes O(V E) time. 5

6 PageRank PageRank is a link analysis algorithm; It assigns a numerical weighting to each node of a hyperlinked set of documents, such as the World Wide Web, with the purpose of measuring its relative importance within the set. The numerical weight that it assigns to any given document E is referred to as the PageRank of E and denoted by PR(E). Other factors like Author Rank can contribute to the importance of a document. A hyperlink to a page counts as a vote of support. The PageRank of a page is defined recursively and depends on the number and PageRank metric of all pages that link to it ( incoming links ). The main idea of ranking: a page that is linked to by many pages with high PageRank receives a high rank itself. 6

8 The values of P R(v) approximate a probability distribution of the likelihood that a person randomly clicking on links will arrive at any particular page. The PageRank computations require several iterations through the collection to adjust approximate PageRank values to more closely reflect the theoretical true value. Damping factor The PageRank theory holds that an imaginary surfer who is randomly clicking on links will eventually stop clicking. The probability, at any step, that the person will continue is a damping factor d. Various studies have tested different damping factors, but it is generally assumed that the damping factor will be set around PR(p i ) = 1 d N +d p j M(p i ) PR(p j ) L(p j ) 8

Klout Measures and Metrics 22 Why Do We Need Measures? Who are the central figures (influential individuals) in the network? What interaction patterns are common in friends? Who are the like-minded users

Chapter 2 Paths and Searching Section 2.1 Distance Almost every day you face a problem: You must leave your home and go to school. If you are like me, you are usually a little late, so you want to take

Solutions to Final Exam Sample Questions CSE 31 1. Show that the proposition p ((q (r s)) t) is a contingency WITHOUT constructing its full truth table. If p is false, then the proposition is true, because

Graph theory and network analysis Devika Subramanian Comp 140 Fall 2008 1 The bridges of Konigsburg Source: Wikipedia The city of Königsberg in Prussia was set on both sides of the Pregel River, and included

Graph Algorithms using Map-Reduce Graphs are ubiquitous in modern society. Some examples: The hyperlink structure of the web 1/7 Graph Algorithms using Map-Reduce Graphs are ubiquitous in modern society.

Chapter 11 Graph Theory The origins of graph theory are humble, even frivolous. Biggs, E. K. Lloyd, and R. J. Wilson) (N. Let us start with a formal definition of what is a graph. Definition 72. A graph

GRAPH THEORY and APPLICATIONS Trees Properties Tree: a connected graph with no cycle (acyclic) Forest: a graph with no cycle Paths are trees. Star: A tree consisting of one vertex adjacent to all the others.

1 Digraphs Definition 1 Adigraphordirected graphgisatriplecomprisedofavertex set V(G), edge set E(G), and a function assigning each edge an ordered pair of vertices (tail, head); these vertices together

Graph A graph G consist of 1. Set of vertices V (called nodes), (V = {v1, v2, v3, v4...}) and 2. Set of edges E (i.e., E {e1, e2, e3...cm} A graph can be represents as G = (V, E), where V is a finite and

Lesson 3 Algebraic graph theory Sergio Barbarossa Basic notions Definition: A directed graph (or digraph) composed by a set of vertices and a set of edges We adopt the convention that the information flows

Network Analysis and Visualization of Staphylococcus aureus by Russ Gibson Network analysis Based on graph theory Probabilistic models (random graphs) developed by Erdős and Rényi in 1959 Theory and tools

Discrete Mathematics Lent 2009 MA210 Solutions to Exercises 8 (1) Suppose that G is a graph in which every vertex has degree at least k, where k 1, and in which every cycle contains at least 4 vertices.

Beyond Watson: Predictive Analytics and Big Data U.S. National Security Agency Research Directorate - R6 Technical Report February 3, 2014 300 years before Watson there was Euler! The first (Jeopardy!)

Class One: Degree Sequences For our purposes a graph is a just a bunch of points, called vertices, together with lines or curves, called edges, joining certain pairs of vertices. Three small examples of

About the Tutorial This tutorial offers a brief introduction to the fundamentals of graph theory. Written in a reader-friendly style, it covers the types of graphs, their properties, trees, graph traversability,

CMPSCI611: Approximating MAX-CUT Lecture 20 For the next two lectures we ll be seeing examples of approximation algorithms for interesting NP-hard problems. Today we consider MAX-CUT, which we proved to

CHAPTER 3 SEMITOTAL AND TOTAL BLOCK-CUTVERTEX GRAPH ABSTRACT This chapter begins with the notion of block distances in graphs. Using block distance we defined the central tendencies of a block, like B-radius

General Network Analysis: Graph-theoretic Techniques COMP572 Fall 2009 Networks (aka Graphs) A network is a set of vertices, or nodes, and edges that connect pairs of vertices Example: a network with 5

Routing WAN Wide Area Networks WANs are made of store and forward switches. To there and back again COMP476 Networked Computer Systems A packet switch with two types of I/O connectors: one type is used

Graph Clustering Why graph clustering is useful? Distance matrices are graphs as useful as any other clustering Identification of communities in social networks Webpage clustering for better data management

The Mathematics of Internet Search Engines David Marshall Department of Mathematics Monmouth University April 4, 2007 Introduction Search Engines, Then and Now Then... Now... Pagerank Outline Introduction

Chapter 7 Google PageRank The world s largest matrix computation. (This chapter is out of date and needs a major overhaul.) One of the reasons why Google TM is such an effective search engine is the PageRank

Discrete Math A, Chapter 8: Scheduling 2 Chapter 8: The Mathematics of Scheduling House Building See pages 280 & 281 8.1 Basic Elements of Scheduling: PROCESSOR: Whomever or whatever is working on a task

Introduction to Graph Mining What is a graph? A graph G = (V,E) is a set of vertices V and a set (possibly empty) E of pairs of vertices e 1 = (v 1, v 2 ), where e 1 E and v 1, v 2 V. Edges may contain

Data Structures and Algorithms Written Examination 22 February 2013 FIRST NAME STUDENT NUMBER LAST NAME SIGNATURE Instructions for students: Write First Name, Last Name, Student Number and Signature where

1. Write the number of the left-hand item next to the item on the right that corresponds to it. 1. Stanford prison experiment 2. Friendster 3. neuron 4. router 5. tipping 6. small worlds 7. job-hunting

Triangle deletion Ernie Croot February 3, 2010 1 Introduction The purpose of this note is to give an intuitive outline of the triangle deletion theorem of Ruzsa and Szemerédi, which says that if G = (V,

What is a Network? Network/Graph Theory Network = graph Informally a graph is a set of nodes joined by a set of lines or arrows. 1 1 2 3 2 3 4 5 6 4 5 6 Graph-based representations Representing a problem

49 6. ROUTING PROBLEMS 6.1. VEHICLE ROUTING PROBLEMS Vehicle Routing Problem, VRP: Customers i=1,...,n with demands of a product must be served using a fleet of vehicles for the deliveries. The vehicles,