Sliding Window

A great number of social network datasets have been, and are, collected through surveys and interviews. For example, an advice network could be collected by asking each individual within a group to designate the people they go to for advice. Another, more rigid, method is to give each individual a list of the other people in the group and let them select the people they go to for advice (roast surveys).

In addition to number of biases (e.g., the informant inaccuracy bias; Bernard et al., 1984; see my thesis for a critic), survey instruments and direct observation methods are generally labour-intensive and difficult to administer. As a result, most networks collected using these methods are of a fairly limited size, often comprising only a few tens (e.g., Bernard et al., 1988) or hundreds (e.g., Fararo and Sunshine, 1964) of people.

Although archival data sources allow for larger networks to be collected, and in turn, more robust statistical analysis to be applied, a bias might be introduced into the data if information about the severing of ties is not included: archival data sources have a much better memory than individuals.¹ For a social network, this could imply that social interactions that are no longer relevant to an individual are recorded as being relevant. Moreover, the weight of ties might be overestimated. These issues do not exist when data is collected through surveys as each individual would only list current or relevant friends with the current tie strength (if they are honest that is).

In the empirical analysis of the online social network, we studied the network in two ways. First, we assumed that social ties never decay (the cumulative perspective). This assumes that if a social interaction is recorded on, for example, day 12, it will become included in the analysis from that point, and it will always remain included. Second, we followed Kossinets and Watts (2006) and imposed lifespans to the social relationships. This ensured that, if two people do not continue to communicate over time, their tie will be severed. This also applied to the weighted network: if the rate of messages sent from one person to another decreases, the tie would be weakened.

The length of the lifespan is crucial in determining which past events are taken into account to generate the network structure at a given point in time. By analysing which past events are relevant to the current state of the network, the length of the lifespan can be defined (Kossinets and Watts, 2006). An ill-defined lifespan will have the effect of, either breaking continuous social interactions into independent sets of interactions, or combining two separate interactions into a single one.

To illustrate the difference between imposing a lifespan and not imposing one, the following figure shows results from the the online social network where networks are constructed both cumulatively and with sliding windows of 2, 3, and 6 weeks. Both panels in the figure highlight the vulnerability of network measures to the use of a sliding window. Panel a suggests that there is only a small core of users that actively use the virtual community at the end of the observation period. An analysis of the cumulative network at that point would be heavily influenced by the majority of users that only used the network in the first 6 weeks, and would not reflect the current activities that are occurring in the community. This could bias network measures and, ultimately, the analysis. Panel b shows the evolution of one possible measure, the clustering coefficient. In particular, the clustering coefficient measured on the active core is mostly below the value found in the cumulative network.

The above figure also highlights the sensitivity of sampling time. By using shorter lifespans, the network measures become more unstable and dependent on the time at which the observation is taken. Kossinets and Watts (2006) argued that network measures would remain stable over time. As a result, the average of the measures in a given observation period can be generalised to a longer period of time. The figure, however, suggest that, when social relationships have a lifespan, network measures are not stable. Therefore, it is difficult to infer from network snapshots stable network measures that can reflect the network structure over a longer period of time.

By allowing for the severing of ties and sampling the network structure at various times over a longer period (e.g., each day in the observation period as we did for the online social network), the validity and robustness of a network analysis could be improved.
_____________________
¹ A number of other limitations, notably validity issues, could also be introduced into the data when using archival data sources.

Uzzi, B., Spiro, J., 2005. Collaboration and creativity: The small world problem. American Journal of Sociology 111, 447-504.

If you use any of the information in this post, please cite: Panzarasa, P., Opsahl, T., Carley, K.M., 2009. Patterns and dynamics of users’ behavior and interaction: Network analysis of an online community. Journal of the American Society for Information Science and Technology 60 (5), 911-932

Thanks for your comment. The code works by multiplying the window parameter by 60*60*24, so if the window parameter is divided by 60*24, it should work. Do inspect the output to ensure it produce the desired outcome.

The link to the standards tnet requires for longitudinal data is dead, and the page on defining longitudinal networks is not finished, but I was wondering if you would be able to post a template or similar to outline how to format longitudinal network data for analysis in tnet?

Thats great thanks!
This may seem like a easy fix, but I can’t find a way of getting e.g. 2008-05-21 16:55:12 into “2008-05-21 16:55:12” (with quotation marks). In excel it never preserves the format of the data and just gives a number e.g. “39620.6798611111”

What I am hoping to do is take a list of pairwise dated interactions amongst a population of wild animals, and see if they have preferred associates. I aim to determine whether the associates are preferred or not if there is evidence of reinforcement/reciprocity (haven’t decided if I go for one or the other at the moment) in the formation of new links in the longitudinal network. I don’t have ties reducing in strength, so I hope to use the sliding window to remove interactions that occur 2-4 days ago (again, precise window to be decided upon).
I have location in space of the individuals as well so I aim to control for that, to see if they are doing more than just interacting with those nearest them, i.e. choosing some and not others. I also have appearance and death dates, so can account for overlap in time as well.

Will probably be coming back to you for help later on, so thanks again in advance.

David

7.davidnfisher | July 17, 2014 at 10:09 am

Ok I’ve got round the problem with the date formatting it seems (well, its “06/05/2008 15:37:00”, so perhaps the date order needs to be re-arranged..) but now when I read it in it says the level sets of factors are different. The network is not symmetrical due to the way we record interactions, but it should be for analyses as its undirected. Can you symmetrise a longitudinal network?

To sort the factor issue, use stringsAsFactors=FALSE when reading the data.

The network can be converted to an undirected network once you have (1) added the sliding window, add_window_l-function, and (2) created a static network, as.static.tnet-function, by using the symmetrise_w-function.