Tools

"... Tracking new topics, ideas, and “memes” across the Web has been an issue of considerable interest. Recent work has developed methods for tracking topic shifts over long time scales, as well as abrupt spikes in the appearance of particular named entities. However, these approaches are less well suite ..."

Tracking new topics, ideas, and “memes” across the Web has been an issue of considerable interest. Recent work has developed methods for tracking topic shifts over long time scales, as well as abrupt spikes in the appearance of particular named entities. However, these approaches are less well suited to the identification of content that spreads widely and then fades over time scales on the order of days — the time scale at which we perceive news and events. We develop a framework for tracking short, distinctive phrases that travel relatively intact through on-line text; developing scalable algorithms for clustering textual variants of such phrases, we identify a broad class of memes that exhibit wide spread and rich variation on a daily basis. As our principal domain of study, we show how such a meme-tracking approach can provide a coherent representation of the news cycle — the daily rhythms in the news media that have long been the subject of qualitative interpretation but have never been captured accurately enough to permit actual quantitative analysis. We tracked 1.6 million mainstream media sites and blogs over a period of three months with the total of 90 million articles and we find a set of novel and persistent temporal patterns in the news cycle. In particular, we observe a typical lag of 2.5 hours between the peaks of attention to a phrase in the news media and in blogs respectively, with divergent behavior around the overall peak and a “heartbeat”-like pattern in the handoff between news and blogs. We also develop and analyze a mathematical model for the kinds of temporal variation that the system exhibits.

"... There is a widespread intuitive sense that different kinds of information spread differently on-line, but it has been difficult to evaluate this question quantitatively since it requires a setting where many different kinds of information spread in a shared environment. Here we study this issue on T ..."

There is a widespread intuitive sense that different kinds of information spread differently on-line, but it has been difficult to evaluate this question quantitatively since it requires a setting where many different kinds of information spread in a shared environment. Here we study this issue on Twitter, analyzing the ways in which tokens known as hashtags spread on a network defined by the interactions among Twitter users. We find significant variation in the ways that widely-used hashtags on different topics spread. Our results show that this variation is not attributable simply to differences in “stickiness, ” the probability of adoption based on one or more exposures, but also to a quantity that could be viewed as a kind of “persistence ” — the relative extent to which repeated exposures to a hashtag continue to have significant marginal effects. We find that hashtags on politically controversial topics are particularly persistent, with repeated exposures continuing to have unusually large marginal effects on adoption; this provides, to our knowledge, the first large-scale validation of the “complex contagion” principle from sociology, which posits that repeated exposures to an idea are particularly crucial when the idea is in some way controversial or contentious. Among other findings, we discover that hashtags representing the natural analogues of Twitter idioms and neologisms are particularly non-persistent, with the effect of multiple exposures decaying rapidly relative to the first exposure. We also study the subgraph structure of the initial adopters for different widely-adopted hashtags, again finding structural differences across topics. We develop simulation-based and generative models to analyze how the adoption dynamics interact with the network structure of the early adopters on which a hashtag spreads.

...antly better indicators of similarity than others. There has also been work on the temporal patterns of information diffusion — the rate over time at which different pieces of information are adopted =-=[9, 18, 21, 24, 30]-=-. In this context there have beencomparisons between the temporal patterns of expected versus unexpected information [9] and between different media such as news sources and blogs [21]. Our analysis ...

"... Online content exhibits rich temporal dynamics, and diverse realtime user generated content further intensifies this process. However, temporal patterns by which online content grows and fades over time, and by which different pieces of content compete for attention remain largely unexplored. We stu ..."

Online content exhibits rich temporal dynamics, and diverse realtime user generated content further intensifies this process. However, temporal patterns by which online content grows and fades over time, and by which different pieces of content compete for attention remain largely unexplored. We study temporal patterns associated with online content and how the content’s popularity grows and fades over time. The attention that content receives on the Web varies depending on many factors and occurs on very different time scales and at different resolutions. In order to uncover the temporal dynamics of online content we formulate a time series clustering problem using a similarity metric that is invariant to scaling and shifting. We develop the K-Spectral Centroid (K-SC) clustering algorithm that effectively finds cluster centroids with our similarity measure. By applying an adaptive wavelet-based incremental approach to clustering, we scale K-SC to large data sets. We demonstrate our approach on two massive datasets: a set of 580 million Tweets, and a set of 170 million blog posts and news media articles. We find that K-SC outperforms the K-means clustering algorithm in finding distinct shapes of time series. Our analysis shows that there are six main temporal shapes of attention of online content. We also present a simple model that reliably predicts the shape of attention by using information about only a small number of participants. Our analyses offer insight into common temporal patterns of the content on the Web and broaden the understanding of the dynamics of human attention.

...s difficult because human behavior behind the temporal variation is highly unpredictable. Previous research on the timing of an individual’s activity has reported that human actions range from random =-=[26]-=- to highly correlated [6]. Although the aggregate dynamics of individual activities tends to create seasonal trends or simple patterns, sometimes collective actions of people and the effects of person...

"... Information diffusion and virus propagation are fundamental processes talking place in networks. While it is often possible to directly observe when nodes become infected, observing individual transmissions (i.e., who infects whom or who influences whom) is typically very difficult. Furthermore, in ..."

Information diffusion and virus propagation are fundamental processes talking place in networks. While it is often possible to directly observe when nodes become infected, observing individual transmissions (i.e., who infects whom or who influences whom) is typically very difficult. Furthermore, in many applications, the underlying network over which the diffusions and propagations spread is actually unobserved. We tackle these challenges by developing a method for tracing paths of diffusion and influence through networks and inferring the networks over which contagions propagate. Given the times when nodes adopt pieces of information or become infected, we identify the optimal network that best explains the observed infection times. Since the optimization problem is NP-hard to solve exactly, we develop an efficient approximation algorithm that scales to large datasets and in practice gives provably near-optimal performance. We demonstrate the effectiveness of our approach by tracing information cascades in a set of 170 million blogs and news articles over a one year period to infer how information flows through the online media space. We find that the diffusion network of news tends to have a core-periphery structure with a small set of core media sites that diffuse information to the rest of the Web. These sites tend to have stable circles of influence with more general news media sites acting as connectors between them.

"... The increasing availability of electronic communication data, such as that arising from e-mail exchange, presents social and information scientists with new possibilities for characterizing individual behavior and, by extension, identifying latent structure in human populations. Here, we propose a m ..."

The increasing availability of electronic communication data, such as that arising from e-mail exchange, presents social and information scientists with new possibilities for characterizing individual behavior and, by extension, identifying latent structure in human populations. Here, we propose a model of individual e-mail communication that is sufficiently rich to capture meaningful variability across individuals, while remaining simple enough to be interpretable. We show that the model, a cascading non-homogeneous Poisson process, can be formulated as a double-chain hidden Markov model, allowing us to use an efficient inference algorithm to estimate the model parameters from observed data. We then apply this model to two e-mail data sets consisting of 404 and 6,164 users, respectively, that were collected from two universities in different countries and years. We find that the resulting best-estimate parameter distributions for both data sets are surprisingly similar, indicating that at least some features of communication dynamics generalize beyond specific contexts. We also find that variability of individual behavior over time is significantly less than variability across the population, suggesting that individuals can be classified into persistent “types”. We conclude that communication patterns may prove useful as an additional class of attribute data, complementing demographic and network data, for user classification and outlier detection—a point that we illustrate with an interpretable clustering of users based on their inferred model parameters.

...ve been shown to reproduce some asymptotic statistical properties of communication patterns; however, they also make predictions that are clearly at odds with important features of the empirical data =-=[29, 22]-=-. In particular, priority queuing models fail to account for two important features of human communication patterns: first, that individuals are influenced by daily and weekly cycles of activity; and ...

"... Abstract. The temporal communication patterns of human individuals are known to be inhomogeneous or bursty, which is reflected as the heavy tail behavior in the inter-event time distribution. As the cause of such bursty behavior two main mechanisms have been suggested: a) Inhomogeneities due to the ..."

Abstract. The temporal communication patterns of human individuals are known to be inhomogeneous or bursty, which is reflected as the heavy tail behavior in the inter-event time distribution. As the cause of such bursty behavior two main mechanisms have been suggested: a) Inhomogeneities due to the circadian and weekly activity patterns and b) inhomogeneities rooted in human task execution behavior. Here we investigate the roles of these mechanisms by developing and then applying systematic de-seasoning methods to remove the circadian and weekly patterns from the time-series of mobile phone communication events of individuals. We find that the heavy tails in the inter-event time distributions remain robustly with respect to this procedure, which clearly indicates that the human task execution based mechanism is a possible cause for the remaining burstiness in temporal mobile phone communication patterns.

...ction. For example we can now study the structure and dynamics of largescale human communication networks [1, 2, 3, 4] and the laws of mobility [5, 6, 7], as well as the motifs of individual behavior =-=[8, 9, 10, 11, 12, 13, 14]-=-. One of the robust findings of these studies is that human activity over a variety of communication channels is inhomogeneous, such that high activity bursts of rapidly occurring events are separated...

by
Qi Xuan, Mohammad Gharehyazie, Premkumar T Devanbu, Vladimir Filkov
- In the proceedings of 2012 ASE/IEEE International Conference on Social Informatics, 2012

"... Abstract—This paper proposes novel quantitative methods to measure the effects of social communications on individual working rhythms by analyzing the communication and code committing records in tens of Open Source Software (OSS) projects. Our methods are based on complex network and timeseries ana ..."

Abstract—This paper proposes novel quantitative methods to measure the effects of social communications on individual working rhythms by analyzing the communication and code committing records in tens of Open Source Software (OSS) projects. Our methods are based on complex network and timeseries analysis. We define the notion of a working rhythm as the average time spent on a commit task and we study the correlation between working rhythm and communication frequency. We build communication networks for code developers, and find that the developers with higher social status, represented by the nodes with larger number of outgoing or incoming links, always have faster working rhythms and thus contribute more per unit time to the projects. We also study the dependency between work (committing) and talk (communication) activities, in particular the effect of their interleaving. We introduce multiactivity time-series and quantitative measures based on activity latencies to evaluate this dependency. Comparison of simulated time-series with the real ones suggests that when work and talk activities are in proximity they may accelerate each other in OSS systems. These findings suggest that frequent communication before and after committing activities is essential for effective software development in distributed systems. I.

...native applications of the new model introduced in this paper. 2. MODELING TEMPORAL BEHAVIOR A large body of work suggests that many human behaviors are heavy-tailed and bursty in the temporal domain =-=[2, 21, 17, 1, 6, 18, 22, 15]-=-. They disagree, however, in their explanation of these properties. Some propose a priority-queue where individuals choose high priority tasks over low priority tasks [2, 21], similar to preferential ...

"... Abstract—There is substantial interest in the effect of human mobility patterns on opportunistic communications. Inspired by recent work revisiting some of the early evidence for a Lévy flight foraging strategy in animals, we analyse datasets on human contact from real world traces. By analysing the ..."

Abstract—There is substantial interest in the effect of human mobility patterns on opportunistic communications. Inspired by recent work revisiting some of the early evidence for a Lévy flight foraging strategy in animals, we analyse datasets on human contact from real world traces. By analysing the distribution of inter-contact times on different time scales and using different graphical forms, we find not only the highly skewed distributions of waiting times highlighted in previous studies but also clear circadian rhythm. The relative visibility of these two components depends strongly on which graphical form is adopted and the range of time scales. We use a simple model to reconstruct the observed behaviour and discuss the implications of this for forwarding efficiency. I.

... physical proximity (i.e., detectability of wireless access points or Bluetooth devices, or closeness of GPS locations [3], [13], [22]) or by telecommunication (i.e., mobile phone call [10] or e-mail =-=[16]-=-), and whether one or both contacting devices are in motion (e.g., both Bluetooth, one Bluetooth and fixed wireless access points, mobile phone and fixed masts). A summary is given in Table I of studi...

...etween check-ins also show a power law behavior with slope ρ ≈ 1. This is fascinating, since it suggests that the mechanisms behind human activity dynamics may be more simple and general than we know =-=[1, 9]-=-. An application that naturally arises from the analysis we have shown in this section is area classification. Given the large variety of places available and all the information we 100 101 102 103 10...