Abstract

Within the online media universe there are many underlying communities. These may be defined, for example, through politics, location, health, occupation, extracurricular interests or retail habits. Government departments, charities and commercial organisations can benefit greatly from insights about the structure of these communities; the move to customer-centered practices requires knowledge of the customer base. Motivated by this issue, we address the fundamental question of whether a subnetwork looks like a collection of individuals who have effectively been picked at random from the whole, or instead forms a distinctive community with a new, discernible structure. In the former case, to spread a message to the intended user base it may be best to use traditional broadcast media (TV, billboard), whereas in the latter case a more targeted approach could be more effective. In this work, we therefore formalize a concept of testing for substructure and apply it to social interaction data. First, we develop a statistical test to determine whether a given subnetwork (induced subgraph) is likely to have been generated by sampling nodes from the full network uniformly at random. This tackles an interesting inverse alternative to the more widely studied “forward” problem. We then apply the test to a Twitter reciprocated mentions network where a range of brand name based subnetworks are created via tweet content. We correlate the computed results against the independent views of sixteen digital marketing professionals. We conclude that there is great potential for social media based analytics to quantify, compare and interpret on-line brand allegiances systematically, in real time and at large scale.