Re: [igraph] clustering coefficients for bipartite networks

From:

jordi torrents

Subject:

Re: [igraph] clustering coefficients for bipartite networks

Date:

Thu, 24 Feb 2011 11:23:19 +0100

Tamàs, Simone and all,
Sorry for top posting and for resurrecting an old thread. I've been
working on the implementation of the different kinds of bipartite
clustering coefficients proposed by Latapy et al (2008), starting with
the implementation proposed by Simone and refined by Tamàs in this
thread. My implementation (see script below) is orders of magnitude
faster than Simone's because it avoids doing two nested loops over the
nodes and only explores nodes which have common neighbors, as proposed
in the paper. My implementation and Simone's yield the same results
but both differ from the results provided in the paper. I've put
together a test using the smallest network from the paper (arXiv
coauthor network) to illustrate the problem. You can download the test
network in graphml format (with node attribute "type" that identifies
top and bottom nodes) and the script from:
http://www.milnou.net/~jtorrents/bipartite_cc/coauthoring_latapy_2mode.graphmlhttp://www.milnou.net/~jtorrents/bipartite_cc/test_bip_cc.py
The output of the script is:
=====================================================================
$ python test_bip_cc.py
Code based on the paper:
Matthieu Latapy, Clémence Magnien and Nathalie Del Vecchio.
Basic Notions for the Analysis of Large Two-mode Networks.
Social Networks 30 (1), p. 31-48, 2008
Loading coauthorship network from the paper. You can download it from:
http://www.milnou.net/~jtorrents/bipartite_cc/coauthoring_latapy_2mode.graphml
The authors report the results in Table 3 p. 41
cc_dot_top = 0.29; cc_dot_bottom= 0.31
cc_min_top = 0.56; cc_min_bottom= 0.73
cc_max_top = 0.36; cc_max_bottom= 0.33
Testing Simone's implementation of cc_dot:
cc_dot_top = 0.47; cc_dot_bottom= 0.56; time spend = 2417 seconds
Testing my implementation of cc_dot:
cc_dot_top = 0.47; cc_dot_bottom= 0.56; time spend = 3 seconds
Testing my implementation of cc_min:
cc_min_top = 0.70; cc_min_bottom= 0.85; time spend = 3 seconds
Testing my implementation of cc_max:
cc_max_top = 0.53; cc_max_bottom= 0.57; time spend = 3 seconds
The results do not match!
It seems that the problem is in the computation of CC for individual
nodes (see pp. 40-41)
because tests for pairwise bipartite clustering coefficients work with
simple examples:
Testing functions of different kinds of bipartite clustering
coefficients between pairs of nodes
using 3 example graphs from figure 5 p. 40 Latapy et al (2008)
Example 0 correct!
Example 1 correct!
Example 2 correct!
===============================================================================
As you can see the results do not match with those provided in the
paper. My coding experience is quite limited, so maybe I did some
mistakes in the implementation. I'd appreciate if someone could revise
the code and provide any clue on how to fix it.
Thanks in advance.
Salut!
2011/2/2 Tamas Nepusz <address@hidden>:
> Hi,
>
> The ccdot attribute is probably a list at the end of your function, and
> lists cannot be saved as GraphML attributes. If you need them later,
> save the graph in pickle format which preserves all the attributes
> irrespectively of their types. (But of course you cannot load the graph
> from another program, only from Python).
>
> --
> T.
>
> On 02/02/2011 01:00 PM, Simone Gabbriellini wrote:
>> Hi,
>>
>> yes, right, I am now doing it... but a strange thing happens: I initialize
>> all the attributes to 0, but after my function updates them, I save the
>> network as a graphml and those attributes are missing... If I try to save
>> the graphml file before my CCBN function, they are all there, set to 0...
>>
>> Am I deleting them in some way with this code?
>>
>> best,
>> simo
>>
>> Il giorno 01/feb/2011, alle ore 21.20, Tamás Nepusz ha scritto:
>>
>>> Hi,
>>>
>>> Cache the values of g.vs(type=0) and g.vs(type=1) in advance, otherwise you
>>> will be calculating them over and over again needlessly. (The first inner
>>> for loop will calculate g.vs(type=0) as many times as many vertices of
>>> type=0 you have). Similarly, calculate the neighbor sets in advance to
>>> avoid creating them all the time from scratch.
>>>
>>> Also, there's no need to convert set(unei+vei) back to a list, just use
>>> len(set(unei) | set(vnei)).
>>>
>>>> I have come to this solution but I don't know if I can consider it a fast
>>>> one:
>>>>
>>>> def CCBN(self):
>>>> for u in self.g.vs(type=0):
>>>> ccdot = []
>>>> for v in g.vs(type=0):
>>>> unei = g.neighbors(u)
>>>> vnei = g.neighbors(v)
>>>> if len(set(unei) & set(vnei)) > 0:
>>>> ccdot.append(len(set(unei) & set(vnei)) /
>>>> len(list(set(unei + vnei))))
>>>> u['ccdot'] = [float(sum(ccdot)) / len(ccdot) if len(ccdot) > 0
>>>> else 0]
>>>> for u in g.vs(type=1):
>>>> ccdot = []
>>>> for v in g.vs(type=0):
>>> The line above should probably be g.vs(type=0). Also, note that the two for
>>> loops are almost the same, only the vs the for loop is iterating over is
>>> different, so I would probably put the two for loops in an auxiliary
>>> function and just call it twice, once with g.vs(type=0) and once with
>>> g.vs(type=1).
>>>
>>> --
>>> Tamas
>>>
>>>
>>> _______________________________________________
>>> igraph-help mailing list
>>> address@hidden
>>> http://lists.nongnu.org/mailman/listinfo/igraph-help
>>
>> _______________________________________________
>> igraph-help mailing list
>> address@hidden
>> http://lists.nongnu.org/mailman/listinfo/igraph-help
>>
>
> _______________________________________________
> igraph-help mailing list
> address@hidden
>http://lists.nongnu.org/mailman/listinfo/igraph-help
>