Computer Science > Social and Information Networks

Title:Improving Website Hyperlink Structure Using Server Logs

Abstract: Good websites should be easy to navigate via hyperlinks, yet maintaining a
high-quality link structure is difficult. Identifying pairs of pages that
should be linked may be hard for human editors, especially if the site is large
and changes frequently. Further, given a set of useful link candidates, the
task of incorporating them into the site can be expensive, since it typically
involves humans editing pages. In the light of these challenges, it is
desirable to develop data-driven methods for automating the link placement
task. Here we develop an approach for automatically finding useful hyperlinks
to add to a website. We show that passively collected server logs, beyond
telling us which existing links are useful, also contain implicit signals
indicating which nonexistent links would be useful if they were to be
introduced. We leverage these signals to model the future usefulness of yet
nonexistent links. Based on our model, we define the problem of link placement
under budget constraints and propose an efficient algorithm for solving it. We
demonstrate the effectiveness of our approach by evaluating it on Wikipedia, a
large website for which we have access to both server logs (used for finding
useful new links) and the complete revision history (containing a ground truth
of new links). As our method is based exclusively on standard server logs, it
may also be applied to any other website, as we show with the example of the
biomedical research site Simtk.

Comments:

Proceedings of the 9th International ACM Conference on Web Search and Data Mining 2016