Copy or Coincidence? A Model for Detecting Social Influence and Duplication Events

Proceedings of the 30th International Conference on Machine Learning, PMLR 28(3):1175-1183, 2013.

Abstract

In this paper, we analyze the task of inferring rare links between pairs of entities that seem too similar to have occurred by chance. Variations of this task appear in such diverse areas as social network analysis, security, fraud detection, and entity resolution. To address the task in a general form, we propose a simple, flexible mixture model in which most entities are generated independently from a distribution but a small number of pairs are constrained to be similar. We predict the true pairs using a likelihood ratio that trades off the entities’ similarity with their rarity. This method always outperforms using only similarity; however, with certain parameter settings, similarity turns out to be surprisingly competitive. Using real data, we apply the model to detect twins given their birth weights and to re-identify cell phone users based on distinctive usage patterns.

Related Material

@InProceedings{pmlr-v28-friedland13,
title = {Copy or Coincidence? A Model for Detecting Social Influence and Duplication Events},
author = {Lisa Friedland and David Jensen and Michael Lavine},
booktitle = {Proceedings of the 30th International Conference on Machine Learning},
pages = {1175--1183},
year = {2013},
editor = {Sanjoy Dasgupta and David McAllester},
volume = {28},
number = {3},
series = {Proceedings of Machine Learning Research},
address = {Atlanta, Georgia, USA},
month = {17--19 Jun},
publisher = {PMLR},
pdf = {http://proceedings.mlr.press/v28/friedland13.pdf},
url = {http://proceedings.mlr.press/v28/friedland13.html},
abstract = {In this paper, we analyze the task of inferring rare links between pairs of entities that seem too similar to have occurred by chance. Variations of this task appear in such diverse areas as social network analysis, security, fraud detection, and entity resolution. To address the task in a general form, we propose a simple, flexible mixture model in which most entities are generated independently from a distribution but a small number of pairs are constrained to be similar. We predict the true pairs using a likelihood ratio that trades off the entities’ similarity with their rarity. This method always outperforms using only similarity; however, with certain parameter settings, similarity turns out to be surprisingly competitive. Using real data, we apply the model to detect twins given their birth weights and to re-identify cell phone users based on distinctive usage patterns.}
}

%0 Conference Paper
%T Copy or Coincidence? A Model for Detecting Social Influence and Duplication Events
%A Lisa Friedland
%A David Jensen
%A Michael Lavine
%B Proceedings of the 30th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2013
%E Sanjoy Dasgupta
%E David McAllester
%F pmlr-v28-friedland13
%I PMLR
%J Proceedings of Machine Learning Research
%P 1175--1183
%U http://proceedings.mlr.press
%V 28
%N 3
%W PMLR
%X In this paper, we analyze the task of inferring rare links between pairs of entities that seem too similar to have occurred by chance. Variations of this task appear in such diverse areas as social network analysis, security, fraud detection, and entity resolution. To address the task in a general form, we propose a simple, flexible mixture model in which most entities are generated independently from a distribution but a small number of pairs are constrained to be similar. We predict the true pairs using a likelihood ratio that trades off the entities’ similarity with their rarity. This method always outperforms using only similarity; however, with certain parameter settings, similarity turns out to be surprisingly competitive. Using real data, we apply the model to detect twins given their birth weights and to re-identify cell phone users based on distinctive usage patterns.

TY - CPAPER
TI - Copy or Coincidence? A Model for Detecting Social Influence and Duplication Events
AU - Lisa Friedland
AU - David Jensen
AU - Michael Lavine
BT - Proceedings of the 30th International Conference on Machine Learning
PY - 2013/02/13
DA - 2013/02/13
ED - Sanjoy Dasgupta
ED - David McAllester
ID - pmlr-v28-friedland13
PB - PMLR
SP - 1175
DP - PMLR
EP - 1183
L1 - http://proceedings.mlr.press/v28/friedland13.pdf
UR - http://proceedings.mlr.press/v28/friedland13.html
AB - In this paper, we analyze the task of inferring rare links between pairs of entities that seem too similar to have occurred by chance. Variations of this task appear in such diverse areas as social network analysis, security, fraud detection, and entity resolution. To address the task in a general form, we propose a simple, flexible mixture model in which most entities are generated independently from a distribution but a small number of pairs are constrained to be similar. We predict the true pairs using a likelihood ratio that trades off the entities’ similarity with their rarity. This method always outperforms using only similarity; however, with certain parameter settings, similarity turns out to be surprisingly competitive. Using real data, we apply the model to detect twins given their birth weights and to re-identify cell phone users based on distinctive usage patterns.
ER -

Friedland, L., Jensen, D. & Lavine, M.. (2013). Copy or Coincidence? A Model for Detecting Social Influence and Duplication Events. Proceedings of the 30th International Conference on Machine Learning, in PMLR 28(3):1175-1183