Using social network analysis to enhance information retrieval systems

It is an ongoing trend that people increasingly reveal very personal
information on social network sites in particular and in the World
Wide Web in general. As this information becomes more and more
publicly available from these various social network sites and the
web in general, the social relationships between people can be
identified. This in turn enables the automatic extraction of social
networks. This trend is furthermore driven and enforced by recent
initiatives such as facebook's connect, MySpace's data availability
and Google’s FriendConnect by making their social network data
available to anyone.

Furthermore the current development of the World Wide Web, termed as
"Web 2.0" by O'Reilly, enables increasingly more people to
publish information without profound technical knowledge. Blogs for
example have gained a lot of attention in recent years. The whole
blogosphere including more than 70 million blogs forms a reasonable
body of information and knowledge. Additionally, hypertext links
made between blogs have been described as conversation, affiliation,
or readership, implying a form of implicit social structure. That
means that the publicly available information is increasingly
annotated with author information which allows the extraction of
social networks, too.

These recent developments described above, together with increasing
computing power and an increased amount of freely available
scientific publication data in diverse databases, has led to a
dramatic growth in interest for social network analysis (SNA) and in
network analysis in general. However, there is little attention
about the application of SNA for use in information retrieval
systems. Recent studies suggest that the social network of a person
has a significant impact on his/her information acquisition.
Additionally SNA offers methods that enable the identification of
important persons within social networks, who could have a
significant influence on the importance of certain information.
Therefore the paper proposes the application of available social
network data in the context of information retrieval systems. An
outline of the research design for the exploration of meaningful
sources for social network extraction and the impact of meaningful
SNA methods and measures in the context of information retrieval
systems is presented. An evaluation of these methods and measures is
conducted on ScientificCommons.org, a search platform for open
access publications with more than 21 million publications and 8.5
million extracted authors and their co-authorship network.

The contribution of this paper is based on an analysis of online
information sources in terms of their usability for the extraction
of social networks and a research framework for the analysis and
application of social network methods to information retrieval
systems. The research framework was applied to the co-authorship
network of scientific publications. The co-authorship network was
used to compute different centrality measures of the authors, which
then in turn have been used to refine the relevance ranking of
publications within information retrieval systems. The performance
of the different rankings based on the different centrality measures
has been evaluated by the measurement of the click-through
performance in the search results.