For the convenience of tnet users, I have collected a number of network datasets that were available on the Internet, and made them conform to the required standard. If you have a network that you would like to add to this page or if there are any mistakes or conflicts of interest, please contact me.

Note: Please do cite the mentioned reference if you use a dataset.

To make it easier for other researchers, it is possible to downloaded the networks in their native form and transformed versions. For example, the Facebook-like Social Network is available as a longitudinal one-mode network (native form) and as a static one-mode network. Two-mode networks are transformed to weighted one-mode networks as described on the projecting two-mode networks onto weighted one-mode networks-page.

For instructions on how to load the datasets in tnet and UCINET, see the end of this page

The Facebook-like Forum Network was attained from the same online community as the online social network; however, the focus in this network is not on the private messages exchanged among users, but on users’ activity in the forum. The forum represents an interesting two-mode network among 899 users and 522 topics in that a weight can be assigned to the ties based on the number of messages or characters that a user posted to a topic. When transforming this weighted two-mode network into a one-mode network, I have maintained the users as I believe these are directly responsible for the tie generation. The number of users in this network is smaller than in the online social network as all users that sent or received private messages did not participate in the forum. Note that the identification numbers do not match with the online social network. The two-mode networks are projected onto one-mode networks using the procedure outlined on the projecting two-mode networks onto weighted one-mode networks-page.

Weighted longitudinal two-mode network (weighted by number of characters): tnet-format

The second dataset is Freeman’s EIES networks (Freeman, 1979), also used in Wasserman and Faust (1994). This dataset was collected in 1978 and contains three networks of researchers working on social network analysis. The first network contains the personal relationships among 48 of the researchers at the beginning of the study (time 1). The second network is the personal relationship at the end of the study (time 2). In these two networks, all ties have a weight between 0 and 4. 4 represents a close personal friend of the researcher’s; 3 represents a friend; 2 represents a person the researcher has met; 1 represents a person the researcher has heard of, but not met; and 0 represents a person unknown to the researcher. The third network is different. It is a matrix with the number of messages sent among 32 of the researchers that used an electronic communication tool (frequency matrix).

There are three pieces of information about each of the 32 researchers that were part of the third network (nodal attributes): their name, the main disciplinary affiliation (1: sociology; 2: anthropology; 3: mathematics or statistics; and 4: others), and the number of citations each researcher had in the Social Science Citation Index in 1978.

Freeman, S.C., Freeman, L.C., 1979. The networkers network: A study of the impact of a new communications medium on sociometric structure. Social Science Research Reports 46. University of California, Irvine, CA.

Network 6: The Caenorhabditis elegans worm’s neural network

This dataset contains the neural network of the Caenorhabditis elegans worm (C.elegans). It was studied by Watts and Strogatz (1998). The network contains 306 nodes that represent neurons. Two neurons are connected if at least one synapse or gap junction exist between them. The weight is the number of synapses and gap junctions. This network was obtained from the Collective Dynamics Group’s website. Note: This network contained 14 duplicated ties (i.e., a tie was mentioned twice in the edgelist). In the files available here, the duplicated tie pairs are merged, and the weight is the sum of the two identical ties.

This is the interlocking directorate among 384 public limited companies in Norway (Allmennaksjeselskap or ASA). The list of companies is created by selecting all companies listed as public limited companies on the website of the Norwegian Business Register on August 5, 2009. For each company, we downloaded public announcements containing changes to the boards’ composition since November 1999. From these announcements, we extracted monthly affiliation (or two-mode) networks since May 2002 (see website for choice of cut-off). Corresponding one-mode projections are also available. We strive to keep the data updated by downloading new announcements around the middle of each month.

As we are not including new companies in the list, but remove companies if they file a bankruptcy notice, the dataset is shrinking. This was also the case with the data used in the original paper (Seierstad and Opsahl, 2011). Although the paper is based on August 1, 2009, data, 17 companies had given a bankruptcy notice by this time. Thus, there were only 367 companies with 1,495 directors.

This dataset contains some nodal attributes. The directors’ and companies’ names are known. In addition, for the companies, the city and post code of their registered office are known, while for the directors, the gender is known.

The data files are available through www.boardsandgender.com along with a description of how the data is collected and directors’ gender determined.

Seierstad, C., Opsahl, T., 2011. For the few not the many? The effects of affirmative action on presence, prominence, and social capital of women directors in Norway. Scandinavian Journal of Management 27 (1), 44-54, doi: 10.1016/j.scaman.2010.10.002

Network 8-11: Intra-organisational networks

This dataset contains four networks are intra-organizational networks. Two are from a consulting company (46 employees) and two are from a research team in a manufacturing company (77 employees). These networks was used by Cross and Parker (2004).

In the first network, the ties are differentiated on a scale from 0 to 5 in terms of frequency of information or advice requests (“Please indicate how often you have turned to this person for information or advice on work-related topics in the past three months”). 0: I Do Not Know This Person; 1: Never; 2: Seldom; 3: Sometimes; 4: Often; and 5:Very Often.

In the second network, ties are differentiated in terms of the value placed on the information or advice received (“For each person in the list below, please show how strongly you agree or disagree with the following statement: In general, this person has expertise in areas that are important in the kind of work I do.”). The weights in this network is also based on a scale from 0 to 5. 0: I Do Not Know This Person; 1: Strongly Disagree; 2: Disagree; 3: Neutral; 4: Agree; and 5: Strongly Agree.

In the third network, the ties among the researchers are differentiated in terms of advice (“Please indicate the extent to which the people listed below provide you with information you use to accomplish your work”). The weights are based on the following scale: 0: I Do Not Know This Person/I Have Never Met this Person; 1: Very Infrequently; 2: Infrequently; 3: Somewhat Infrequently; 4: Somewhat Frequently; 5: Frequently; and 6: Very Frequently.

The fourth network is based on the employees’ awareness of each others’ knowledge and skills (“I understand this person’s knowledge and skills. This does not necessarily mean that I have these skills or am knowledgeable in these domains but that I understand what skills this person has and domains they are knowledgeable in”). The weight scale in this network is: 0: I Do Not Know This Person/I Have Never Met this Person; 1: Strongly Disagree; 2: Disagree; 3: Somewhat Disagree; 4: Somewhat Agree; 5: Agree; and 6: Strongly Agree.

This dataset was collected by Davis and colleague in the 1930s. It contains the observed attendance at 14 social events by 18 Southern women. For a more detailed description, see Davis et al. (1941) or Wasserman and Faust (1994). The first name of the women is also available (1kb).

There are three US airport networks. The first is the network of the 500 busiest commercial airports in the United States. This dataset was used in Colizza et al. (2007). A tie exists between two airports if a flight was scheduled between them in 2002. The weights corresponds to the number of seats available on the scheduled flights. Even thought this type of networks is directed by nature as a flight is scheduled from one airport and to another, the networks are highly symmetric (Barrat et al., 2004). Therefore, the version of this network is undirected (i.e., the weight of the tie from one airport towards another is equal to the weight of the reciprocal tie). This network was obtained from the Complex Networks Collaboratory’s website

The second dataset is the complete US airport network in 2010. This is the network used in the first part of the Why Anchorage is not (that) important: Binary ties and Sample selection-blog post. The data is downloaded from the Bureau of Transportation Statistics (BTS) Transtats site (Table T-100; id 292) with the following filters: Geography=all; Year=2010; Months=all; and columns: Passengers, Origin, Dest. Based on this table, the airport codes are converted into id numbers, and the weights of duplicated ties are summed up. Also ties with a weight of 0 are removed (only cargo), and self-loops removed.

This is the network is the high-voltage power grid in the Western States of the United States of America. The nodes are transformers, substations, and generators, and the ties are high-voltage transmission lines. This network was originally used in Watts and Strogatz (1998). Although the transmission lines can be directed and differentiated based on their capacity, this information is not available.

To use tnet, you first need to download and install R and then download and install tnet within R (information from tnet’s website). You only need to do these steps once. Every time that you start R, you need to load tnet. This you can do by writing the following command

library(tnet)

A dataset can be loaded by writing a command similar to:

net <- read.table("<link to dataset>")

where is the link to the dataset in the above table, e.g. Freeman’s third EIES network can be loaded by the following command:

To load a dataset, you must download and save the dl-file of the dataset you wish to study from the above table to your computer. The network can be imported into UCINET by using the DL import function. You can find this function through the menu: Data > Import > DL. When the function’s dialog box opens, you must select the downloaded file containing the dataset by clicking on “…” after “Input text file in DL format”. The second box can be set to default, but do remember, and change if you wish, the name that appears in the third box as this will be the name of the internal UCINET file.

Like this:

Hi Tore,
First off, congratulation on the completion of your Ph.D. Now, I have a question for you. Have you ever come across a network dataset that includes geospatial data along with the usual weights and edge lists? I am working on infrastructural networks and this can provide some extra information about node that are note connected, however I am having a hard time finding such a data set.

I just downloaded your Online Social Network dataset in UCINET format. its really nice and thanks for the effort. it is in “edge list” format. Is there any tool which provides conversion from edge list format to “node list” format? Kindly let me know.

Dear Tore, thanks for sharing the data with us. I’m most interested into your bipartite network data. The one you compiled from Newman – how did you actually do that? I could only find a compiled, weighted, one-mode version from him. Do you have the raw data from him? Or could you tell me which author hides behind which ID? Is the first column the author or the paper?
Both informations would be very valuable to me.
The same with the women-event data: do you have a key which woman is what and which event is which? That would be very interesting! Thanks again for sharing,
Nina

Mark Newman sent me the two-mode network, which did contain the names of the authors. I have uploaded this file now (see the network description). Also, in the description of the Southern Women dataset is a link to the women’s first name.

All the two-mode network files list people ids first, and then the paper/event ids.

I find your site extremely helpful. In my PhD I am working on a study of 20 entrepreneurial high tech firms in Iceland and building a framework of their international ventures. As the interviews with the firms have proceeded I have noticed aspects which have drawn my attention to social networks. Especially interesting was that many of the entrepreneurs mentioned the same individuals who were connectors in their initial international ventures. I am also observing their relations on the new social networks on the web and the networks which they talk about in the interviews and the networks on the web are quite different! This would be interesting to observe with SNAS tools. I have now a list of 130 individuals/links (all numbered from 1 to 130). Now I am struggling to set it up in an R-Framework. Any suggestions?

The standard answer is to use an edgelist as most R-packages can easily load this format. An edgelist is a list of the ties between nodes. For example:

1 2
3 1

would be the two ties from node 1 to node 2 and from node 3 to node 1. My package, tnet, uses a third column that differentiates the ties from one another:

1 2 4
3 1 2

so here, the first tie is twice as strong as the second tie.

To load a text file (.txt) with an edgelist, simply write in R:

net <- read.table(“filename_of_edgelist_file.txt”)

That said, the standard network analysis packages might not be appropriate for you as they generally are cross-sectional and cannot distinguish between the first tie and the last tie. From you description, you might want to keep this temporal aspect in your analysis. Also, you might want to consider keeping the two-mode structure (i.e., people and companies) instead of simply creating a network among people (see the post on projecting networks).

The datasets are in tnet format. For one-mode networks, this is “sender”, “receiver”, and “tie weight” (for undirected networks, each tie is mentioned twice – one in each direction). For two-mode networks, it is primary nodes, secondary nodes, and tie weight (optional). For longitudinal networks: “time”, “sender”, “receiver”, and “tie weight”.

Thanks for letting me know. I just tried all of them, and they seemed to work. As they are all hosted on the same server, they should all work or all not work (e.g., if the server is down). Please email me if you have further issues.

hello,
they are presice data ,and thank you .but i can’t download the data, and i don’t konw why, i tried many times. so i will appreciate it that you can send me these data in type of ucinet, and thanks a lot !

Which datasets do you have an issue with? They seem to be working fine for me. Have you tried to right-click and save as? If this doesn’t work, send me an email and I will send you the specific datasets.

I am Master student working on my thesis which is about using social networks in promoting healthy behaviors. I need adjacency matrices from Facebook users networks. Do u have data sets include adjacency matrices from Facebook users networks?

Thanks for your comments, and making sure that the data is correct. The c. elegans files on here are direct copies from Duncan Watt’s old group website. I would try to reach out to him if you have a question regarding the data.

Dear Tore:
First of all, thanks for sharing the datasets with us. Can you provide the detailed information for the network 14a (e.g. the name of these airports, or the location of these airports)? I cannot figure out which airport every number in the date set represents.
Best Regards,
Xiaoge Zhang

Unfortunately, this information is not available for the US airport top 500. I would suggest that you use the full network (14b), calculate node strength scores, and then pick the top 500 if you are interested in the network among the top nodes.

The tie weights in the “weighted by number of characters”-networks are the sum of characters across all messages sent from one person to another (or group). This differ from the “weighted by number of messages”-version as the tie weight in these is the number of messages sent.

As you mentioned above, the dataset C.elegans Neural Network contains 14 duplicated ties and that you merged each duplicate tie into one tie by summing their weights. Still, I see in the files provided above these duplicates. So just wondering if these files has duplicates being removed or not.

Thanks for your comment. I do not believe there are any duplicated ties in the tnet format edgelist, the dl-format edgelist, nor the R-object packaged with the latest version of tnet. Do send me an email if you do you an issue.

Dear Dr Tore,
I’m a PhD student working on models of network formation and I find this website very useful, thank you very much for sharing your datasets. For my thesis I would like to use the dataset of a directed unweighted large network composed by several small separated subnetworks (each subnetwork should have at most 20 nodes), but I’m not able to find a dataset matching these requirements. Do you have any suggestion? Thank you

I am not aware of a real-world network like that. Of course, you can find multiple smaller networks, but they would have different organizing principles. You might want to look into intra-school class networks.

Hi. i want to use the Facebook-like social network but it is directed. i want to change it to undirected network. is it true that, if there is
1 3 32 , I add 3 1 32 to the matrix and for this row ( 3 1 35 ) in the tnet-format , I sum 32 and 35 and consider (32+35) as tie weight between node 1 and 3?

is it possible to create a unique text file which include different teams by specifying for instance the team number? I would like to calculate betweenness centrality for the nodes within a team. Shall I create a file for each team or is it possible to add a numbering so that t-net calculates centrality for all individual taking into account the team to which they belong?

I am not entirely sure what you would like to do. If you’d like to compute betweenness but only consider a subgraph, you can extract the subgraph and then compute betweenness. If you’d like to compute betweenness when nodes are aggregated into a team, you need to do the aggregation first, and then use the betweenness function.

Thanks for the data sources!
I am currently working with the openflights network 14c) and I found out that there are only 2939 Nodes (not the stated 7976). The highest ID of a node is 7976 but there are not all nodes from ID 1 to ID 7976 present. But there are 30501 edges. Did take some time until I figured this out, so maybe other people have the same problem too.

The network has a number of isolates; hence the higher number of nodes than unique node ids in the edgelist.

The network is not outdated, so it would be worth going to get the data again from Openflights.org. I can see that the link doesn’t work anymore. If you re-collect this dataset, let me know and we can put it up here.