DeepRank: improving unsupervised node ranking via link discovery

Abstract

This paper proposes an unsupervised node-ranking model that considers not only the attributes of nodes in a graph but also the incompleteness of the graph structure. We formulate the unsupervised ranking task into an optimization task and propose a deep neural network (DNN) structure to solve it. The rich representation capability of the DNN structure together with a novel design of the objectives allow the proposed model to significantly outperform the state-of-the-art ranking solutions.

Notes

Acknowledgements

This material is based upon work supported by the Air Force Office of Scientific Research, AOARD under Award Number FA2386-17-1-4038, and Taiwan Ministry of Science and Technology (MOST) under Grant Number 106-2218-E-002-042.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Definitely \(\Delta \) is the difference between the upper bound and the original objective function. For ease of presentation, we use matrix representations to derive \(\Delta \). Let vector \(\varvec{\pi } \in [0, \infty ) ^{N}\) be the ranking score vector for all N nodes, while \(\varvec{1}\) represents a N-dimensional constant vector of all 1’s. Matrix \(\varvec{Q} \in [0, \infty ) ^{N \times N}\) denotes a transition matrix where each entry \(q_{ij} = n_{i}^{-1}\) for row i and column j. By the definition of \(\varvec{Q}\), the entry sum of each row of \(\varvec{Q}\) is exactly 1, that is, \(\varvec{Q} \varvec{1} = \varvec{1}\). Having the matrix representations, we derive \(\Delta \) as follows,

Appendix D: Proof for the reduction of node ranking

Suppose that for all link \((i, j) \in E\), the inequality \(\frac{\pi _{j}}{m_{j}} \ge \frac{\pi _{i}}{n_{i}}\) holds. Then for any arbitrary node j, \(\frac{\pi _{j}}{m_{j}} \ge \frac{\pi _{i}}{n_{i}}\) for all direct predecessors i of j. That is, \(\frac{\pi _{j}}{m_{j}}\) must be no less than the average of \(\frac{\pi _{i}}{n_{i}}\) of all direct predecessors \(i \in P_{j}\). Given \(m_{j} = | P_{j} |\), we have:

Appendix E: Introduction of competitors

In social network analysis, centrality methods find the most important nodes based on current network structure. We choose two common centrality definitions in Freeman (1978), closeness and betweenness centralities. Closeness centrality claims that nodes with shorter path length to others are more important. Betweenness centrality claims the more important nodes are part of more shortest paths in the network.

where vector \(\varvec{\pi }\) is the ranking score set of all N nodes, \(\varvec{Q} = [ q_{ij} = \frac{1}{n_{j}} ]\) is the transition matrix, \(\varvec{1}\) is a vector of all 1’s, and d is the damping factor normally set to 0.85.

It is the state-of-the-art semi-supervised solution to node ranking. It is composed of a supervised part and an unsupervised part. We adopt only its unsupervised component with node attributes. The objective function of its unsupervised component is simplified as below,

It is the state-of-the-art unsupervised general approach to node ranking with node attributes. We follow the setup written in the original paper for parameter setting and selection. The update equation is as below,