Author: Syrovatskiy Ilya, ODS Slack nickname : bokomaru

Tutorial

"Epidemics on networks with NetworkX and EoN"

With this tutorial, you'll tackle such an established problem in graph theory as Epidemic dynamics models.

Firstly we'll have to deal with loading your own data from the VKontakte network using it's API, so we will go through some basic principles of requests and authentification. If you don't have account in this network - I'll give you already created graph on my own friends network (308 people), but with changed names and IDs. Probably, someone doesn't want to show his name and ID for OpenDataScience community (: . Also I will provide you the link to the graph based on social net with changed info for every person. Our main instrument for graph modeling will be the NetworkX library in Python.

Since we get graph created, we are ready to start with somtething interesting.
We'll go over the basic building blocks of graphs (nodes, edges, etc) and create pseudo random graph with the same depth and quantity of verteces.

Then we are going to visualize created graphs - there will be some obvious differences between our graphs.

Next point is to talk about main theme of this tutorial - Epidemic on Network. Thus, you'll know some new stuff about different models of epidemic's distributions.

After you get to know basics it's time to go deeper into epidemic modeling. We'll explore the most spread models with code in two graphs (real and pseudo-random), and compare the results with python library for epidemic modeling EoN for each case.

Since we have observed everything I planned in this tutorial, it'll be the time to look at results we got while getting in the world of network, and then - make a conclusion.

Since we live in the 21th centure, almost all people have accounts in different networks, where they can be closer to their friends wherevere they are.
As it plays significant part of our lives, analysis in this sphere is an amazing opportunity to know something interesting about ourselves and our friendship.

The nice thing about graphs is that the concepts and terminology are generally intuitive. Nevertheless, here's some basic lingo:

Graphs are structures that map relations between objects. The objects are referred to as nodes and the connections between them as edges in this tutorial. Note that edges and nodes are commonly referred to by several names that generally mean exactly the same thing:

node == vertex == point
edge == arc == link

For implement graph in our analysis it's good idea to use some libraries.

Firstly, it's NetworkX library. NetworkX is the most popular Python package for manipulating and analyzing graphs. Several packages offer the same basic level of graph manipulation, but, most likely, NetworkX is the best.

Secondly, it's EoN library. EoN (Epidemics on Networks) is a Python module, that provides tools to study the spread of SIS and SIR diseases in networks (SIR and SIS definition I'll provide in the chapter 6). EoN is built on top of NetworkX.

Thirdly, since we want to get our friendlist from VK, we have to use their API - that means we need some libraries for requests. If you are not VK user, you can change a bit code in this notebook to get your friends, for example, from Facebook. I am sure, that is pretty the same.

If you are NOT VK user, you can skip this part and jump to loading already created data for graph (Lazy fast start). But probably, you can get some new really interesing information in this part for your future researches. There will be not only work with API, but also random generating people with saving their relationships!

API stands for Application Programming Interface, or an interface for programming applications. In the case of web applications, the API can provide data in a format other than the standard HTML, which makes it convenient to use while writing different applications. Third-party public APIs most often provide data in one of two formats: XML or JSON.

Based on the API: various mobile and desktop clients for Twitter and Vkontakte are built. APIs have high-quality and well documented APIs.

After we created the application you can find access token in the Applications section.

Many VK API methods assume the presence of a private token that must be passed as a parameter when executing the request. The process of obtaining a token is described in the documentation: https://vk.com/dev/access_token

Attention! Token is called private for a reason. The person possessing it can perform a variety of actions on your behalf. Do not show it to anyone.

In short, you will be given the ID of your application and the list of access rights, that you want to provide to the user of the API. Then you need to specify this data as parameters in the URL of the following format

You can experiment here, just look into API documentation. Requests to API are really usefull: you can build your own web app (using Python and Django), then make correct Auth and connection to API server, and so you will be able to get almost all information you want automatically. For example, you can mining posts, people profiles, etc. with respect to your aims, and then do a research in something amazing in society.

OK, let's continue:

If token is not correct or it is already outdated, you will get an error :

If one of this limits is exceeded, the server will return the following error: 'Too many requests per second'.

If your app's logic implies many requests in a row, check the execute method.

Except the frequency limits there are quantitative limits on calling the methods of the same type. By obvious reasons we don't provide the exact limits info.

Excess of a quantitative limit access to a particular method will require captcha (see captcha_error). After that it may be temporarily limited (in this case the server doesn't answer on particular method's requests but easily processes any other requests).

You can pause when performing any operation in Python using the sleep function from the time module. To do so you must pass the number of seconds for which the program will be suspended:

In [26]:

foriinrange(5):time.sleep(.5)print(i)

0
1
2
3
4

We already saw that we can get response errors in JSON, so you have to check everything before and after querying to avoid getting false and incorrect information.

Also, there are many different subtleties of usage API. For example, to get a list of friends of a user, you need to use the friends.get method, which can return both a simple friend list and detailed information about each friend, depending on whether the fields parameter is specified (if not specified, simply returns the ID list). And if the fields parameter is specified, then for one request you cannot get information about more than 5000 people.

Since you've created your APP and got APP ID and token, you are ready to download your friends.

defget_friends_ids(user_id,fields=""):res=requests.get("https://api.vk.com/method/friends.get",params={"user_id":user_id,"fields":fields,"access_token":TOKEN,"version":5.85}).json()# also you can add access token in the request, receiving it via OAuth 2.0ifres.get('error'):print(res.get('error'))returnlist()returnres[u'response']

In [9]:

# asking for friends and their gender # notice that gender is in the format 1=female, 2=male# uid supposed to be here your user ID to get YOUR friendsfull_friends=get_friends_ids(uid,["name","sex"])

defget_random_people(full_friends,names,surnames):n_people=len(full_friends)n_m=0n_f=0true_id_f=[]true_id_m=[]forfriendinfull_friends:iffriend['sex']==2:n_m+=1true_id_m.append(friend['uid'])else:n_f+=1true_id_f.append(friend['uid'])print("people number: ",n_people,", men: ",n_m,", women: ",n_f)# take only top popular names for both Female and Male : names_f=names.query('sex == "F"')[:n_f].name.valuesnames_m=names.query('sex == "M"')[:n_m].name.values# take random n_people surnames : random.seed(17)rand_indc=np.random.choice(a=range(len(surnames)),size=n_people,replace=False)s_names=surnames.surname.values[rand_indc]# separate on female/males_names_f=s_names[:n_f]s_names_m=s_names[n_f:]# we will take from here random IDs of users:ids=np.random.choice(a=range(1001,9999),size=n_people,replace=False)# separate on female/maleid_f=ids[:n_f]id_m=ids[n_f:]random_f=pd.DataFrame(data={'uid':id_f,'first_name':names_f,'last_name':s_names_f,'true_id':true_id_f,'user_id':id_f,'sex':1})random_m=pd.DataFrame(data={'uid':id_m,'first_name':names_m,'last_name':s_names_m,'true_id':true_id_m,'user_id':id_m,'sex':2})# merge male and female random setsrandom_people=pd.concat([random_f,random_m])return(random_people)

Now it's time to load it back, or as I do, to continue with new generated :

In [12]:

# If you have constracted your own graph withour renaming, load it from your storage:withopen("full_graph_depth2.txt")asf:full_graph=json.loads(f.read())withopen("full_friends.txt")asf:full_friends=json.loads(f.read())

In [15]:

# If you've run every operation step by step with me. so load this :# pay attention that I will work with full_graph and full_friends, but meaning that sets,# that I generated in previous steps#or if you skipped everything, it's also for you:withopen("full_graph_rand_people.txt")asf:full_graph=json.loads(f.read())withopen("full_friends_rand_people.txt")asf:full_friends=json.loads(f.read())

First thing we will do - creation of 100 random graphs with the same number of edges and vertices and look at the average clustering coefficient.

nx.gnm_random_graph():

Returns a random graph. In the model, a graph is chosen uniformly at random from the set of all graphs with nodes and edges.

In [71]:

average_clust_coefs=[]foriinrange(100):GR=nx.gnm_random_graph(len(G.nodes()),len(G.edges))average_clust_coefs.append(nx.average_clustering(GR))print("The average over average clustering coefficients random graphs: ",np.mean(average_clust_coefs))plt.hist(list(nx.clustering(GR).values()))plt.title("Clustering coefficients over the last random Graph")

The average over average clustering coefficients random graphs: 0.07102606178964932

Out[71]:

Text(0.5, 1.0, 'Clustering coefficients over the last random Graph')

As you can see, average clustering coefficient is around 10 times smaller than in our real graph, although the number of nodes and edges the same.