PhD Theses

Recommender Systems have become a pervasive technology in a wide spectrum of everyday applications, and can be said to be familiar to the general public. In situations where there is an information overload, such as e-commerce, streaming platforms or social networks, providing personalized recommendations has proven to be a major source of enhanced functionality, user satisfaction, and revenue improvements. The development of recommendation algorithms and technologies has typically focused on maximizing the prediction accuracy of the user’s interests. However, there is an increasing awareness in the field that there are other properties that have an impact on user satisfaction and business performance. In particular, there are many cases where applying some degree of novelty or diversity may be beneficial for both the users that receive the recommendations and the business that provides them.

In this thesis we develop a principled approach to the evaluation and enhancement of novelty and diversity in Recommender Systems. We consider that the improvement of such fundamental dimensions of the usefulness of recommendations has to take into account how users explore and perceive recommendations, what are the problems that novelty and diversity solve and the causes of such problems. We propose in our first contribution a unified framework for the evaluation and enhancement of novelty and diversity in recommendations that generalizes and enhances many of the proposals previously studied in the state of the art under a common basis. Special emphasis is done in the study of the diversity within recommendations lists, for which two different contributions are presented. On the one hand, an adaptation of search result diversification metrics and techniques from Information Retrieval is explored to cope with the ambiguity of user interests and tastes. On the other hand, a domain-specific solution for assessing and optimizing the diversity of recommendations is proposed to address the need of users for varied recommendations when genre information about the recommendation domain is available. Finally, we address diversity as an overall quality from the system point of view, and we propose solutions for the problem in this perspective by turning the recommendation task around and recommending users to items.

Our proposals are tested on a common experimental design that considers three different datasets for movie and music recommendation and four well-known baseline recommendation algorithms. The results of our experiments support the validity of our contributions and allow the analysis and further insights on their behavior when applied to different settings.

Recommender Systems (RS) aim to help users with information access and retrieval tasks, suggesting items -products or services- according to past preferences -interests, tastes- in certain contexts. For such purpose, one of the most studied contexts is the so-called temporal context, which has originated an already extensive research area, known as Time-Aware Recommender Systems (TARS).

Despite the large number of approaches and advances on TARS, in the literature, reported results and conclusions about how to exploit time information seem to be contradictory. Although several reasons could explain such contradictory findings, in thisthesis we hypothesize that TARS evaluation plays a fundamental role. The existence of multiple evaluation methodologies and metrics makes it possible to find some evaluation protocol suitable for a particular recommendation approach, but ineligible or non-retributive for others. Problems that arise from this situation represent an impediment to fairly compare results and conclusions reported in different studies, making complex the identification of the best recommendation approach for a given task. Moreover, the review of published work shows that most of the existing TARS have been developed for diminishing the error in the prediction of user preferences (ratings) for items. However, nowadays the RS focus is shifting towards finding (lists of) items relevant for the target user. Also, the use of RS in diverse tasks lets develop new applications where time context information can serve as a distinctive input.

In this thesis we analyze how time context information has been exploited in the RS literature, in order to a) characterize a robust protocol that lets conduct fair evaluations of new TARS, and facilitate comparisons between published performance results; and b) better exploit time context information in different recommendation tasks. Aiming to accomplish such goals, we have identified key methodological issues regarding offline evaluation of TARS, and propose a methodological framework that lets precisely describe conditions used in the evaluation of TARS. From the analysis of these conditions, we provide a number of guidelines for a robust evaluation of RS in general, and TARS in particular. Moreover, we propose adaptations and new methods for different recommendation tasks, based on the proper exploitation of available time context information. By using fair evaluation settings, we are able to reliably assess the performance of different methods, identifying the circumstances under which some of them outperform the others.

In summary, by means of the proposed methodological characterization and the conducted experiments, we show the importance of using a robust evaluation method to measure the performance of TARS, issue which had not been addressed in depth so far.

Personalised recommender systems aim to help users access and retrieve relevant items from large collections, by automatically identifying products or services of likely interest based on observed evidence of the users’ preferences. For many reasons, user preferences are difficult to guess, and therefore recommender systems have a considerable variance in their success ratio in estimating the user’s tastes. In such a scenario, self-predicting the chances that a recommendation is accurate before actually submitting it to a user becomes an interesting capability from many perspectives. Performance prediction has been studied in the context of search engines in the Information Retrieval field, but there is little research of this problem in the recommendation domain. This thesis investigates the definition and formalisation of performance prediction methods for recommender systems. Specifically, we study adaptations of search performance predictors from the Information Retrieval field, and propose new predictors drawing from Information Theory and Social Graph Theory. We show the instantiation of information-theoretical performance prediction methods on both rating and access log data, and the application of social-based predictors to social network structures.

Recommendation performance prediction is a relevant problem per se, because of its potential application to many uses. We primarily evaluate the quality of the proposed solutions in terms of the correlation between the predicted and the observed performance on test data. Given that the evaluation of recommender systems is an open area to a significant extent, the thesis addresses the evaluation methodology as a part of the researched problem. We analyse how the variations in the evaluation procedure may alter the apparent behaviour of performance predictors, and we propose approaches to avoid misleading observations. In addition to the stand-alone assessment of the proposed predictors, we research the use of the predictive capability in the context of the dynamic adjustment of hybrid methods combining several recommenders. We research approaches where the combination leans towards the algorithm that is predicted to perform best in each case, aiming to enhance the performance of the resulting hybrid configuration. The thesis reports positive empirical evidence confirming both a significant predictive power for the proposed methods in different experiments, and consistent improvements in the performance of dynamic hybrid recommenders employing the proposed predictors.

The ever-increasing volume and complexity of information flowing into our daily lives challenge the limits of human processing capabilities in a wide array of information seeking and e-commerce activities. In this context, users need help to cope with this wealth of information, in order to reach the most interesting products, while still getting novelty, surprise and relevance. Recommender systems suggest users products or services they may be interested in, by taking into account or predicting their tastes, priorities or goals. For that purpose, user profiles or usage data are compared with some reference characteristics, which may belong to the information objects (content-based approach), or to other users in the same environment (collaborative filtering approach). Inspired by Information Retrieval and Machine Learning techniques, both approaches are based on statistical or heuristic models that attempt to capture the correlations between users and objects. Commercial applications like Amazon online store (www.amazon.com), Google News (news.google.com) or YouTube (www.youtube.com), are examples of significant success stories of recommendation techniques. However, several limitations of the current recommender systems remain, such as the sparsity of user preference and item content feature spaces, the difficulty of recommending items to users with few preferences declared, or the lack of flexibility to incorporate contextual factors into the recommendation methods.

Some of these limitations can be related to a limited understanding and exploitation of the semantics underlying both user profiles and item descriptions. In this respect, an enhancement of the semantic knowledge, and its representation, describing interests and contents, is envisioned as a potential direction to deal with those limitations. This thesis explores the development of an ontology-based knowledge model to link the (explicit and implicit) meanings involved in user interests and resource contents. Upon this knowledge representation, several content-based and collaborative recommendation models are proposed and evaluated. The proposed model supports contextual techniques to extend the reach of recommendation and improve their accuracy. A refinement of the collaborative filtering space by semantic layers is proposed to find focused similarities, which enable further and more accurate recommendations.

Personalization in information retrieval aims at improving the user’s experience by incorporating the user subjectivity into the retrieval methods and models. The exploitation of implicit user interests and preferences has been identified as an important direction to enhance current mainstream retrieval technologies and anticipate future limitations as worldwide content keeps growing, and user expectations keep rising. Without requiring further efforts from users, personalization aims to compensate the limitations of user need representation formalisms (such as the dominant keyword-based or document-based) and help handle the scale of search spaces and answer sets, under which a user query alone is often not enough information for the system to provide effective results. However, the general set of user interests that a retrieval system can learn over a period of time, and bring to bear in a specific retrieval session, can be fairly vast, diverse, and to a large extent unrelated to a particular user search in process. This means that even on the basis of correctly learned user preferences, the system could make wrong guesses or get intrusive. Rather than introducing all user preferences en bloc, an optimum search adaptation could be achieved if the personalization system was able to select only those preferences which are pertinent to the ongoing user actions. In other words, although personalization alone is a key aspect of modern retrieval systems, it is the application of context awareness into personalization what can really produce a step forward in future retrieval applications.

Context modeling has been long acknowledged as a key aspect in a wide variety of problem domains, among which Information Retrieval is a prominent one. In this work, we focus on the representation of live retrieval user contexts, based on implicit feedback techniques. The particular notion of context considered in this thesis is defined as the set of themes under which retrieval user activities occur within a unit of time. Our proposal of contextualized personalization is based on the semantic relation between the user profile and the user context. Only those preferences related to the current context should be used, disregarding those that are out of context. The use of semantic-driven representations of the domain of discourse, as a common, enriched representational ground for content meaning, user interests, and contextual conditions, is proposed as a key enabler of effective means for a) a rich user model representation, b) context acquisition at runtime and, most importantly, c) the discovery of semantic connections between the context and concepts of user interest, in order to filter those preferences that have chances to be intrusive within the current course of user activities.