Abstract

Disentangling the effects of selection and influence is one of social science's greatest unsolved puzzles: Do people befriend others who are similar to them, or do they become more similar to their friends over time? Recent advances in stochastic actor-based modeling, combined with self-reported data on a popular online social network site, allow us to address this question with a greater degree of precision than has heretofore been possible. Using data on the Facebook activity of a cohort of college students over 4 years, we find that students who share certain tastes in music and in movies, but not in books, are significantly likely to befriend one another. Meanwhile, we find little evidence for the diffusion of tastes among Facebook friends—except for tastes in classical/jazz music. These findings shed light on the mechanisms responsible for observed network homogeneity; provide a statistically rigorous assessment of the coevolution of cultural tastes and social relationships; and suggest important qualifications to our understanding of both homophily and contagion as generic social processes.

The homogeneity of social networks is one of the most striking regularities of group life (1–4). Across countless social settings—from high school to college, the workplace to the Internet (5–8)—and with respect to a wide variety of personal attributes—from drug use to religious beliefs, political orientation to tastes in music (1, 6, 9, 10)—friends tend to be much more similar than chance alone would predict. Two mechanisms are most commonly cited as explanations. First, friends may be similar due to social selection or homophily: the tendency for like to attract like, or similar people to befriend one another (11, 12). Second, friends may be similar due to peer influence or diffusion: the tendency for characteristics and behaviors to spread through social ties such that friends increasingly resemble one another over time (13, 14). Though many prior studies have attempted to disentangle these two mechanisms, their respective importance is still poorly understood. On one hand, analytically distinguishing social selection and peer influence requires detailed longitudinal data on social relationships and individual attributes. These data must also be collected for a complete population of respondents, because it is impossible to determine why some people become friends (or change their behaviors)* unless we also know something about the people who do not. On the other hand, modeling the joint evolution of networks and behaviors is methodologically much more complex than nearly all past work has recognized. Not only should such a model simulate the ongoing, bidirectional causality that is present in the real world; it must also control for a number of confounding mechanisms (e.g., triadic closure, homophily based on other attributes, and alternative causes of behavioral change) to prevent misdiagnosis of selection or influence when another social process is in fact at work (15).

Using a unique social network dataset (5) and advances in actor-based modeling (16), we examine the coevolution of friendships and tastes in music, movies, and books over 4 years. Our data are based on the Facebook activity of a cohort of students at a diverse US college (n = 1,640 at wave 1). Beginning in March 2006 (the students’ freshman year) and repeated annually through March 2009 (the students’ senior year), we recorded network and profile information from Facebook and supplemented it with academic and housing data from the college (SI Materials and Methods, Study Population and Profile Data). Our research was approved by both Facebook and the college in question; no privacy settings were bypassed (i.e., students with “private” profiles were considered missing data); and all data were immediately encoded to preserve student anonymity. Because data on Facebook are naturally occurring, we avoided interviewer effects, recall limitations, and other sources of measurement error endemic to survey-based network research (17). Further, in contrast to past research that has used interaction “events” such as e-mail or instant messaging to infer an underlying structure of relationships (7, 18), our data refer to explicit and mutually confirmed “friendships” between students. Given that a Facebook friendship can refer to a number of possible relationships in real life, from close friends or family to mere acquaintances, we conservatively interpret these data as documenting the type of “weak tie” relations that have long been of interest to social scientists (19).

Though network homogeneity has been a perennial topic of academic research, prior attempts to separate selection and influence suffer from three limitations that cast doubt on the validity of past findings. These limitations are summarized by Steglich et al. (15), who introduce the modeling framework we use. First, prior approaches to network and behavioral coevolution inappropriately use statistical techniques that assume all observations are independent—an assumption that is clearly violated in datasets of relational data. Second, prior approaches do not adequately control for alternative mechanisms of network and behavioral change that could also produce the same findings. For instance, two individuals who share a certain behavior may decide to become friends for other reasons (e.g., because they have a friend in common or because they share some other behavior with which the first is correlated), and behaviors may change for many other reasons besides peer influence (e.g., because of demographic characteristics or because all individuals share some baseline propensity to adopt the behavior). Third, prior approaches do not account for the fact that the underlying processes of friendship and behavioral evolution operate in continuous time, which could result in any number of unobserved changes between panel waves. In response, Snijders and colleagues (16, 20, 21) propose a stochastic actor-based modeling framework. This framework considers a network and the collective state of actors’ behaviors as a joint state space, and models simultaneously how the network evolves depending on the current network and current behaviors, and how behaviors evolve depending on the same current state. The framework also respects the network dependence of actors, is flexible enough to incorporate any number of alternative mechanisms of change, and models the coevolution of relationships and behaviors in continuous time. In other words, it is a family of models capable of assessing the mutual dependence between networks and behaviors in a statistically adequate fashion (see Materials and Methods and SI Materials and Methods for additional information).

We report the results of three analyses of students’ tastes and social networks. First, we identify cohesive groupings among students’ preferences; second, we model the evolution of the Facebook friend network; and third, we examine the coevolution of tastes and ties using the modeling framework described above. Cultural preferences have long been a topic of academic, corporate, and popular interest. Sociologists have studied why we like what we like (9, 22) and the role of tastes in social stratification (23, 24). Popular literature has focused on the diffusion of consumer behaviors and the role of trendsetters therein (25, 26). To date, however, most evidence of taste-based selection and influence is either anecdotal or methodologically problematic according to the aforementioned standards (15). Further, the majority of research on tastes has relied on closed-ended surveys—commonly only about music, and typically measuring these preferences in terms of genres. In contrast, Facebook provides users with open-ended spaces in which to list their “favorite” music, movies, and books, offering an unprecedented opportunity to examine how tastes are structured as well as how they coevolve with social ties (5). Additionally, unlike experimental studies of influence that use a fixed network structure (14), we also assess the reciprocal impact of tastes on networks; and unlike dynamic studies of diffusion that focus on adoption of a single product or taste (18), we compare how the coevolution process varies across three domains of preferences.

Results

Though our dataset contains a total of 10,387 unique tastes, most of these (64%) are never expressed by more than a single student and cannot plausibly contribute to selection or influence dynamics. We therefore focus only on those tastes that appeared among the 100 most popular music, movie, and book preferences for at least one wave of data, collectively accounting for 49% of students’ actual taste expressions. Fig. 1 presents a visualization of students’ music tastes, where similar tastes are positioned closer together in 3D space. We define the “similarity” between two tastes as the proportion of students they share in common, i.e., the extent to which they co-occur. We then used a hierarchical clustering algorithm to identify more or less cohesive groupings of co-occurring tastes. Rather than assuming a priori that tastes are patterned in a certain way, this inductive approach reveals those distinctions among items that students themselves find subjectively meaningful (SI Materials and Methods, Cluster Analysis of Students’ Tastes).

Visualization of the distribution of students’ music preferences on Facebook, featuring all items that appeared among the 100 most popular music tastes in at least one wave of data (n = 145). Similar tastes appear closer together in 3D space, where similarity is defined as the rate of co-occurrence between two tastes, and coordinates are determined by multidimensional scaling. Node size is proportionate to taste popularity, and colors refer to the five largest clusters of tastes identified by a hierarchical clustering algorithm. Cluster names are generalizations for illustrative purposes only. Visualizations for movies and books are provided in the Supporting Information, as well as animations of all three “taste spaces” for closer inspection.

Next, we examine the determinants of Facebook friend network evolution. Fig. 2 displays select parameter estimates β and 95% confidence intervals for a model of Facebook friend network evolution estimated over the entire 4 years of college (Materials and Methods and SI Materials and Methods, Evolution of Facebook Friendships). We included all those students for whom friendship data were available at all four waves based on students’ privacy settings (n = 1,001). Though we use a very different relationship measure (friendships documented online) compared with traditional surveys, our findings largely coincide with past research (11, 27, 28). The dominant influence on friendship evolution is mere propinquity: the log-odds of two students becoming and remaining Facebook friends increases by 0.98 if the students live in the same building and by 0.56 if they share the same academic field (and thus enroll in many of the same classes). Friendships—even on Facebook—are also powerfully influenced by social proximity; sharing only a single friend in common (triadic closure) increases the log-odds of two students becoming and remaining friends by 0.10—an effect that multiplies with each additional shared friend. Finally, students tend to self-segregate on the basis of gender, racial background, region of origin, and socioeconomic status.

Parameter estimates β and 95% confidence intervals for a stochastic actor-based model of the evolution of Facebook friendships over 4 years (n = 1,001). Significant coefficients are labeled with an asterisk, where a coefficient is considered significant if the 95% confidence interval does not contain β = 0. Coefficients generally correspond to the change in log-odds of a tie being present vs. absent if the given criterion is met (e.g., the friendship is between two friends-of-friends or two students who share the same gender), although the case for socioeconomic homophily—a continuous variable—is more complex (SI Materials and Methods, Evolution of Facebook Friendships).

Finally, to disentangle the importance of selection vs. influence, we combine data on tastes and friendships into a single analysis of network and behavioral coevolution (Materials and Methods and SI Materials and Methods, Coevolution of Tastes and Friendships). Models were again estimated over the entire study period and limited to students for whom both taste and network data were available at all four waves (n = 211 for music, n = 201 for movies, n = 191 for books). Results for selection and influence parameters are presented in Fig. 3. Controlling for peer influence and over a dozen alternative determinants of network evolution, we find that students who like artists in the “lite/classic rock” or “classical/jazz” clusters display a significant tendency to form and maintain friendships with others who express tastes in the same cluster. We also find that students self-segregate on the basis of movie preferences: two students who like movies in the “dark satire” or “raunchy comedy/gore” clusters are significantly more likely than chance to become and remain friends. Social selection effects are not statistically significant for any of the other clusters of music and movie tastes, however—nor are they significant for any of the five book clusters considered here. Meanwhile, results on the behavioral evolution side of the model tell a very different story. Controlling for social selection and several alternative determinants of taste evolution, we find significant evidence for peer influence with respect to only one of the 15 taste clusters: students whose friends express tastes in the “classical/jazz” music cluster are significantly more likely to adopt such tastes themselves. Outside of this finding, preferences do not in any way appear to be “contagious” among Facebook friends over the duration of college. In fact, students whose friends list tastes in the “indie/alt” music cluster are significantly likely to discard these tastes in the future—an instance of peer influence operating in the opposite direction as predicted by prior research.

Parameter estimates β and 95% confidence intervals for selection and influence effects from 15 models of the coevolution of friendships and tastes (n = 211 for music, n = 201 for movies, n = 191 for books). Significant coefficients are labeled with an asterisk, where a coefficient is considered significant if the 95% confidence interval does not contain β = 0. Selection effects measure the tendency for a tie to develop between two students who both express tastes in the given cluster; influence effects measure the tendency for students whose friends express tastes in the given cluster to themselves adopt tastes in that cluster (SI Materials and Methods, Coevolution of Tastes and Friendships).

Discussion

Tastes are central to human identity, and are commonly viewed as an important source of interpersonal affinity and group solidarity. Our findings suggest important qualifications to this perspective: the social impact of a taste may depend first on its medium (e.g., tastes in music and in movies appear to be more consequential than tastes in books), and second on the particular content of the preference. Notably, tastes shared by “everyone” may be so banal that they no longer serve as effective markers of social differentiation: We find the least evidence for social selection, positive or negative, among students who like “pop” music, “hit” movies, and books on “the college bookshelf”. Cultural diffusion—the spread of tastes through social ties—is also an intuitively plausible mechanism commonly invoked to explain changes in fashion. Such claims are rarely substantiated by rigorous empirical research, however; and examples of “successful” diffusion may be more accessible to memory (29), whereas ubiquitous instances of “failed” diffusion are routinely ignored (30). Our findings suggest that friends tend to share some tastes not because they influence one another, but because this similarity was part of the reason they became and remained friends in the first place. Further, the one type of preference that does “spread” among Facebook friends—classical/jazz music—may be especially “contagious” due to its unique value as a high-status cultural signal (31);† whereas students whose friends like “indie” or alternative bands may try to symbolically distance themselves from these peers (32). Future research should focus more on the motives and mechanisms of cultural diffusion, including how the likelihood of transmission varies across different types of preferences, people, and contexts, rather than viewing it as an undifferentiated social process akin to fluid churning through a pipeline (14, 18).

Our analyses are limited in a number of ways. Selection and influence may play very different roles in relationships that are stronger than “Facebook friendship,” and tastes expressed online may reflect not only genuine psychological preference but also “presentation of self” or the desire to fit in. We also do not have data on environmental influences such as concerts, movie nights, or assigned reading that may influence students’ preferences and contribute to network homogeneity. Most importantly, our models of selection and influence focus only on a small subset of students (i.e., those who provided complete taste and network data at all four waves) in a single college cohort. Though the software we use is capable of handling some degree of missing data (33), >70% of the original study cohort provided no tastes at all during their senior year alone (either due to privacy settings or nonreport); and even permitting a single wave of missing data led to intractable models. We therefore acknowledge that our results are not necessarily generalizable to the students who did not report both tastes and ties at all four waves, much less other populations of students elsewhere.‡

Despite these limitations, our models provide an analytically rigorous assessment of a process of long-standing scientific and popular interest—an assessment that we hope will spur additional research in other settings using alternative measures of friendships and tastes. Given that we conduct this assessment in an online context (Facebook) that is increasingly significant for the conduct of everyday life (34), using a relationship type (“weak ties”) considered particularly conducive to the diffusion of information (19), our data show surprisingly little evidence for the common notion that what we like rubs off on those around us. Rather, our findings would support a view of contemporary online interaction as having less to do with influencing our neighbors and more to do with strengthening social ties among those whom we already resemble.

Materials and Methods

Here, we provide an overview of the stochastic actor-based modeling approach. Additional details and full model results are presented in SI Materials and Methods. Further information and context is provided in the comprehensive publications by Snijders and colleagues (15, 16, 20, 21).

As described previously, stochastic actor-based models are the first statistical framework to overcome three significant limitations of prior approaches to social selection and peer influence. These models conceive of global transformations in network structure (and global trends in behavior) as the accumulation of microlevel decisions on the part of individual actors. Though prior approaches to modeling networks and behavior consider each wave of observation as a discrete “event” to be explained directly by the prior wave, panel waves are here considered merely “snapshots” of an underlying process of continuous social change. In other words, the difference between two successive observations could be explained by any number of possible network/behavior trajectories over time. The change process is decomposed into its smallest possible components, or “microsteps.” At any given “moment,” a single probabilistically selected actor is given the opportunity to modify either a social tie (create a tie, dissolve a tie, or do nothing) or her behavior (adopt a taste, discard a taste, or do nothing). No more than one network or behavioral change can be made at any one moment; each actor's decisions thus constitute the surrounding social context in which subsequent decisions by other actors will occur. The network component of the model is also here estimated in such a fashion as to mimic the process whereby Facebook friendships actually develop: a tie is created if and only if a request is sent and then confirmed, and it may be dissolved by either actor at any time.

Though the probability of receiving the opportunity to make a tie change or behavior change can depend on individual attributes or network position (according to the network and behavioral “rate functions,” respectively), we here assume these opportunities are equally distributed for all actors for each distinct transition period between two waves. Therefore, the sole functions that need to be specified are the “objective functions” for network and behavioral change—in other words, the functions that determine the short-term “objectives” each actor will tend to pursue when the opportunity for change arises. The network component of the objective function has the following general shape:

In Eq. 1, is the value of the objective function for actor i depending on state x of the network and state z of all network members’ behavior. Effects correspond to possible reasons an actor might have for changing a network tie (i.e., micromechanisms of network evolution), and weights are effect strengths. Following past research, we consider “relational” effects such as triadic closure (the tendency of friends-of-friends to become friends); “assortative” effects reflecting homophily according to gender, race, socioeconomic status, and region of origin; and “proximity” effects such as coresidence in the same building and sharing the same academic field of study (7, 27, 35). We also control for preferential attachment (the tendency of popular students to become more popular) and the baseline tendency of students from different backgrounds to form more or fewer ties overall. Formulae for all effects are presented in SI Materials and Methods.

For our pure model of network evolution (Fig. 2), Eq. 1 and effects are sufficient because only network evolution is modeled without consideration for students’ coevolving tastes. In other words, the z component of the model is presumed to be absent. To move from this model to our models of network and behavioral coevolution (Fig. 3), we must not only add effects specifying how network evolution depends on students’ preferences (specifically, a “sociality” effect for the tendency of students with certain tastes to form more or fewer ties overall, and the focal social selection effect for the tendency of students with similar tastes to become friends), we must also incorporate a second, behavioral component of the objective function with the following general shape:

Rather than determining the rules by which actors make decisions about their network ties, Eq. 2 governs actors’ choices with respect to a focal behavior z—here, the quantity of “favorites” a student listed in a given taste cluster. Effects now correspond to the various reasons an actor might choose to change her tastes, and are again effect strengths. These effects include two terms (one linear, one quadratic) specifying the baseline distribution of the given taste cluster among the study population: a term controlling for the tendency of students with different demographic characteristics (men compared with women, white students compared with black, Asian, “mixed” race, or Hispanic students, and students from varying socioeconomic backgrounds) to express more or fewer tastes in the given cluster, a term controlling for the tendency of more popular students to express more or fewer tastes in the given cluster, and the focal peer influence effect representing students’ tendency to “assimilate” to the preferences expressed by their friends.

In sum, upon receiving the opportunity to make a change, actors will tend to pursue short-term goals that will maximize the value of the relevant objective function (plus a random residual representing nonmodeled influences). In the case of the network function, they do this by forming a tie, dissolving a tie, or doing nothing; and in the case of the behavioral function, they do this by adopting a taste, discarding a taste, or maintaining their current set of preferences. Because of the complex dependencies between ties and behavior implied by the above processes, these models are too complicated for the calculation of likelihoods or estimators in closed form. Maximum-likelihood estimation has recently been developed for these models, but it is currently feasible only for much smaller networks (36). We therefore estimate parameter values using an approach called “method of moments,” which depends on computer simulations of the change process (21, 37). In short, this approach conditions on the first wave of observation, and it is the subsequent transition periods between waves that are the focus of parameter estimation. For a given set of initial parameter values, the model is implemented as a stochastic simulation algorithm used to generate dynamic network and behavioral data. The simulated data are then compared against the actually observed change patterns, and parameters iteratively adjusted until the observed values for a set of relevant statistics are reproduced well by the simulations according to the final parameter values. T ratios for all parameters, quantifying the deviations between simulated values of the statistics and their observed values, are used to assess model convergence. (Convergence was excellent for all models presented here.) Full model results for the model of network evolution and each of the 15 models of network and behavioral coevolution are presented in the Supporting Information.

Finally, a word on parameter interpretation. As noted previously, the objective functions can be used to compare how attractive various tie and behavioral changes are for a given actor, where the probability of a given change is higher as the objective function for that change is higher (subject to the constraints of the current network/behavior structure as well as random influences). Parameters can therefore be interpreted similarly to those obtained by logistic regression, i.e., in terms of the likelihood of somewhat idealized microsteps. A parameter estimate of 0.56 for the “shared academic field” effect, for instance, means that a tie between two students who share the same major will have a log-probability of being created that is 0.56 greater than the log-probability of an otherwise identical tie between two students who do not share the same major. Interpretation of selection and influence parameters is slightly more complex given that behavior variables are ordinal rather than nominal and (in the case of peer influence) depend not just on the correspondence between two potential friends’ tastes, but on the correspondence between a given student's tastes and the tastes of all of her friends. The effect for social selection is here defined by the product of two potential friends’ tastes, such that a positive effect means that actors who express relatively many tastes in a given cluster will prefer ties to others who also express relatively many tastes in that cluster. The effect for peer influence is here defined by the average value of tastes among a focal student's friends, such that a positive effect means that actors whose friends express relatively many tastes in a given cluster will themselves have a stronger tendency to adopt tastes in that cluster. Formulae for all effects are provided in SI Materials and Methods.

Acknowledgments

We thank Cheri Minton for assistance with data processing; Andreas Wimmer and Nicholas Christakis for their collaboration in compiling the dataset on which this research is based; and two anonymous reviewers for their valuable comments and suggestions. This research was supported by National Science Foundation Grant SES-0819400.

Footnotes

Author contributions: K.L., M.G., and J.K. designed research, performed research, analyzed data, and wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

↵*Throughout, we use the term “behavior” to refer to any type of endogenously changing individual attribute.

↵†An alternative explanation for this finding is that classical/jazz music is a “difficult” genre that one must learn to appreciate—learning that often takes place through friendship ties. We thank an anonymous reviewer for this suggestion.

↵‡An additional question is whether selection and influence dynamics vary over time. Due to the small proportion of students who reported both taste and network data at all four waves, this question is difficult to assess with our dataset and we have here focused on identifying enduring effects that operate throughout the duration of college. However, supplementary analyses suggest that certain selection and influence effects may indeed be particularly pronounced among certain subsets of students and/or during certain phases in the college experience; and in fact, when we limit attention to the first period only (i.e., freshman to sophomore year)—and all students who reported data for this period—we do find some evidence for selection and influence with respect to book tastes during this early phase of college. Full results are presented in SI Materials and Methods, Robustness Checks.

Blood-sucking sand flies from disparate global regions have a predilection for feeding on the marijuana plant (Cannabis sativa), and the findings hint at a potential avenue for controlling sand flies, which can transmit leishmaniasis.