4.
Introduction Methodology Data-Set Results Conclusion and FW
Cross-Community Eﬀects II
Although inspired by Kuhn, we expected evolution of
communities in rather an alleviated form
Instead of paradigm shift, we were looking for community
shift
Community merge is a complementary phenomenon, but
rather uninteresting one
Thus, rather combinations of shifts with subsequent merges,
i.e. community merge/shifts, were investigated
Instead of paradigm articulation, we were looking for
community specialization
Co-citation networks of two big camps in CS were analysed:
Semantic Web (solution-driven) and Information Retrieval
(problem-driven) [1]
3 / 34

6.
Introduction Methodology Data-Set Results Conclusion and FW
Initial Expectations&Requirements
The methodology was developed with a set of certain requirements
arising from the nature of the problem:
1 Dynamic data-set represented by snapshots of several
consecutive time-steps
2 Communities have to be identiﬁed in the network in each
time-step
3 Authors (nodes in general) have to be uniquely identiﬁed
among all time-steps
4 For topical analysis, meta-data (topics) describing the nodes
are necessary
5 / 34

9.
Introduction Methodology Data-Set Results Conclusion and FW
Visualization
To compare and inspect the state of the network in diﬀerent
time-steps, a proper visualization is very helpful
Nodes that appeared previously should have similar positions
Colours denoting the aﬃliation of the node to its cluster
should be preserved
As we have not found any existing tool implementing these
requirements, we built our own one based on JUNG
Another tool based on Graphviz was build to automatically
create diagrams of ancestors and descendants based on
respective relations
8 / 34

10.
Introduction Methodology Data-Set Results Conclusion and FW
Topic Detection I
We mined keywords using NLP techniques [3] from the
abstracts or full-texts for almost 70% of the underlying articles
Tokenised and stemmed [6] keywords were then assigned to
each author
Ability of keywords to discriminate authors was ranked
according to their frequency (TF) and uniqueness in the
corpus (IAF): TF-IAF
Each author a in time-step t was thus described by a
t
bag-of-words vector ka
Topical description of cluster c was obtained by a centroid of
its members
Cosine similarity was used for determining topical similarity of
two clusters
9 / 34

11.
Introduction Methodology Data-Set Results Conclusion and FW
Topic Detection II
Interpretation of a cluster’s topic was based on characterizing
keywords—a union of:
20 highest ranked keywords
20 most frequent keywords
We were particularly interested in cross-community activity
between IR and SW camps
Deﬁnition what is IR- and what SW-related community was
based on frequent patterns mined from the publications
Any event detected by community topic evolution measures
associated with both IR- and SW-related communities was
then considered as an inter-camp dynamics
Meta-data was used to assess the quality of clusterings—WT
was omitted from further analysis
10 / 34

12.
Introduction Methodology Data-Set Results Conclusion and FW
Measures
Overlap measures induce huge number of inter-reactions
between communities
Solution is to apply more speciﬁc measures or to use the
simple ones in combination
We developed and/or used two categories of measures
1 community life-cycle measures for measurement and
explanation the state and the evolution of the community
2 community topic evolution measures for revealing of
cross-community phenomena like community shift
11 / 34

14.
Introduction Methodology Data-Set Results Conclusion and FW
Community Topic Evolution Measures
We looked for parallel changes of structure and topic of
communities
Structural and topical measures were combined by
multiplication for simplicity and because the range remains
within [0, 1]
Community shift PS may be detected as an emergence of a
new community topically distinct from its ancestor:
PS (cit , cjt+1 ) = dissim(cit , cjt+1 ) × ancestor (cit , cjt+1 )
13 / 34

16.
Introduction Methodology Data-Set Results Conclusion and FW
Data-Set
We ﬁrst picked a set of major conferences in both ﬁelds
We then selected publications from these conferences from
DBLP for 2000–2009
Co-citation network of 5772 authors and 817642 edges over
all years was extracted
3-year time-steps with 2-year overlap: 2000–2002,
2001–2003, . . .
Total number of articles was 39314 for which we were able to
scrape 22975 abstracts and 3740 full-texts
Nearly 70% coverage by content
We scraped 18313 author-provided keywords for 4102 distinct
articles
Coverage by these high-quality meta-data was 10%
We mined 263742 keywords from abstracts and full-texts
15 / 34

17.
Introduction Methodology Data-Set Results Conclusion and FW
Shift of Louvain Community 26
Emergence of Louvain community 26 was identiﬁed as an
.
inter-camp community shift PS = 0.62 in 2006
It was formed by 80% of community 6 “web IR” and by 20%
of community 5 “SW”
The keywords in 2006 like “navigation”, “personalization”,
and “semantic web” suggests transdisciplinary topics
Massive inﬂuence of community 15 “SW and IR” in 2007 and
a change of topic towards “SW and business processes”
.
Observed as a low topic drift T = 0.29
IR-related keywords appeared again among characterizing
keywords in 2008
.
Topic then stabilized: T = 0.65
16 / 34

20.
Introduction Methodology Data-Set Results Conclusion and FW
Specialization of Infomap Community 9
First oriented on general and core SW-related topics in 2000
Between 2002–2004 we identiﬁed 3 shifts
One of these shifts was community 99 “semantic desktop and
personalization”
The community itself then specialized on “SW services”
S,T , and H provided valuable insights
ρ, B, and A did not seem to provide any further insights
19 / 34

23.
Introduction Methodology Data-Set Results Conclusion and FW
Shift/Merge of Community 86
.
We identiﬁed shift/merge PS/M = 0.91 of community 86
with community 0
Both communities were concerned with IR-related topics, but
each had its speciﬁc theme:
86 being more focused on “development”, “engine”, and
“system”
0 being more focused on “question answering”
90.9% of authors from 86 moved to community 0
.
Relative density ρ = 0.47 and high cluster content ratio
.
H = 1.91 suggests it was topically coherent, but structurally
weak
It is not possible to generalize the suitability of any life-cycle
measures as we have identiﬁed only one shift/merge
22 / 34

25.
Introduction Methodology Data-Set Results Conclusion and FW
Change of topic of Infomap community 54
.
Inter-camp community topic change PC = 0.58 was identiﬁed
for Infomap community 54 between 2005 and 2006
The topic changed from “knowledge management” and
“information extraction” towards “knowledge querying” and
“semantic web”
Zero author entropy A suggests this might have been caused
by new members joining the community
34.5% were completely new, i.e. they did not come from any
previous community
20.7% coming from 54 “knowledge management and
information extraction”
17.2% coming from 29 “ontologies and SW”
6.9% coming from 70 “ontologies and folksonomies”
6.9% coming from 112 “semantic web services”
24 / 34

27.
Introduction Methodology Data-Set Results Conclusion and FW
Emergence of Intermediary Louvain Community 15
The most complex scenario we investigated
It ﬁrst emerged as a descendant of community 4 “IR” with
topic “cross-language IR”, which was identiﬁed as a
.
community shift PS = 0.55 in 2003
Since 2004, this community was under a massive inﬂuence of
community 5 “SW”, which caused a change towards
.
SW-related topics PC = 0.31
Since 2005, IR-related keywords appeared again among
characterizing keywords, while those keywords disappeared in
community 5
Therefore, whereas community 5 kept its focus on the core
SW-related topics, it largely participated in forming of a new
interdisciplinary community
26 / 34

30.
Introduction Methodology Data-Set Results Conclusion and FW
Conclusion and Future Work I
We presented a general and scalable methodology for analysis
of cross-community phenomena uniquely combining
topological and content analysis and supported by special
visualization techniques
Three community topic evolution measures tailored for
identifying phenomena like community shift, shift/merge, and
change of topic were proposed and successfully assessed
Community shift and topic change were detected quite
commonly, which suggests that they are part of many
community life-cycles
Community shift/merge was detected very rarely, which either
means we have to improve the measure or that this is simply a
rare phenomenon
We proposed life-cycle measures characterising the states and
evolution of communities
29 / 34

31.
Introduction Methodology Data-Set Results Conclusion and FW
Conclusion and Future Work II
The assessment showed that average vertex betweenness,
relative density, cluster content ratio, and topic drift oﬀered
valuable insights into the phenomena revealed by community
topic evolution measures
We observed strong shifts PS → 1, when the shifted
community disappeared in the next time-step
These strong shifts had usually very diﬀerent but coherent
topics
They might have been the initial sources of new topics or even
research streams
Frequently, a newly emerged community had quite weak
structure (low ρ, high A) and/or topic (low T ), while these
characteristics then improved in the subsequent time-steps
B seems to be a good measure for identiﬁcation of
intermediary communities
30 / 34

32.
Introduction Methodology Data-Set Results Conclusion and FW
Conclusion and Future Work III
We intend to cluster the community life-cycles by the
characteristic events expressed by all the measures
We expect this to provide an automated way of extracting
life-cycle taxonomies
The combination of content and structural analysis allowed us
to assess the quality of clustering revealed only by inspection
of structure of the network
We consider this original approach as a fertile ground for
future research
We plan to use other algorithms—e.g. co-clustering algorithm
of both content and objects [4]
We will extend the whole work to a larger data-set
31 / 34