Automated methods for identity resolution across online social networks

Abstract:

Today, more than two hundred Online Social Networks (OSNs) exist where each OSN extends to offer distinct services to its users such as eased access to news or better business opportunities. To enjoy each distinct service, a user innocuously registers herself on multiple OSNs. For each OSN, she defines her identity with a different set of attributes, genre of content and friends to suit the purpose of using that OSN. Thus, the quality, quantity and veracity of the identity varies with the OSN. This results in dissimilar identities of the same user, scattered across Internet, with no explicit links directing to one another. These disparate unlinked identities worry various stakeholders. For instance, security practitioners find it difficult to verify attributes across unlinked identities; enterprises fail to create a holistic overview of their customers.
Research that finds and links disconnected identities of a user across OSNs is termed as identity resolution. Accessibility to unique and private attributes of a user like ‘email’ makes the task trivial, however in absence of such attributes, identity resolution is challenging. In this dissertation, we make an effort to leverage intelligent cues and patterns extracted from partially overlapping list of public attributes of compared identities. These patterns emerge due to consistent user behavior like sharing same mobile number, content or profile picture across OSNs. Translating these patterns into features, we devise novel heuristic, unsupervised and supervised frameworks to search and link user identities across social networks. Proposed search methods use an exhaustive set of public attributes looking for consistent behavior patterns and fetch correct identity of the searched user in the candidate set for an additional 13% users. An improvement on the proposed search mechanisms further optimizes time and space complexity. Suggested linking method compares past attribute value sets and correctly connect identities of an additional 48% users, earlier missed by literature methods that compare only current values. Evaluations on popular OSNs like Twitter, Instagram and Facebook prove significance and generalizability of the linking method.
Proposed search and linking methods are applicable to users that exhibit evolutionary and consistent behavior on OSNs. To understand the dynamics and reasons for such behavior, we conduct two independent in-depth studies. For user evolutionary behavior, specifically for username, we observe that username evolution leads to broken link (404 page) to a user profile. Yet, 10% of 8.7 million tracked Twitter users changed their username in two months. Investigation reveals that reasons to change include malign intentions like fraudulent username promotion and benign ones like express support to events. We believe that Twitter can monitor frequent username changes, derive malign intentions and suspend accounts if needed. Study of sharing information consistently across OSNs, e.g. mobile number, highlights why users share a personally identifiable information online and how can it be used with auxiliary information sources to derive details of a user.
In summary, this dissertation encashes previously unused public user information available on a
social network for identity resolution via novel methods. The thesis work makes following advancements: a) Propose search frameworks that aim to fetch correct identity of a user in the candidate set by searching with public and discriminative attributes, b) Propose a supervised classification framework for linking identities that compares respective attribute histories in situations where state-of-the-art methods fail to predict the link, c) Study username evolution on Twitter, and d) Study mobile number sharing behavior across OSNs. Proposed methods require no user authorization for data access, yet successfully leverage innocuous user public activity and details, find her accounts across OSNs and help stakeholders with better insights on user’s likings or her suspicious intentions.