The 3rd Gamification in Information Retrieval (GamifIR 2016) Workshop, hosted at SIGIR 2016, started off with a highly insightful and timely keynote by Sebastian Deterding. Sebastian is a senior research fellow at the Digital Creativity Labs at the University of York, a founder and principal designer of the design agency coding conduct, a founder of the Gamification Research Network, and co-editor of “The Gameful World” (MIT Press, 2015).

In his keynote, Sebastian focused on why gamification needs a proper theory. His motivations ranged from being able to ask measurable questions, to build systematic knowledge and even to collaborate with the commercial sector. He claimed that gamification right now is stuck in groundhog day, where study after study repeatedly tries to answer the same vague question: “Does gamification work”? The answer to this question, Sebastian says, “is of course, Mu”. Mu in Japanese means that the question is wrongly put or it is the wrong question to ask. It is like asking: “Does medicine work”? This question assumes that it is a unitary thing, for all people and all diseases. Better questions to ask include “What substances under what conditions work”? However, to ask these questions one must have theory.

Theory is set of statements expressing a nomological network of constructs and their relations, which allows to describe, explain and control reality. Without theory there is no giant whose shoulder we can stand on. Without theory, we cannot systematically organise and extend knowledge. Without theory, we cannot ask the grand question, inviting a new science of design: “How do you design features to affect cognitive states, which than affect user behaviour”?

The current state of art in gamification is a basic psychological mediation model with three black boxes, connected by arrows: a set of design elements, which affects motivation, which in turn affects behaviour.

However, there are a number of issues with this model. For one, the design elements are currently ill-formed taxonomies, much like the old Chinese encyclopaedias with overlapping categories. For example, in an old Chinese encyclopaedia, animals were described as either a) belonging to the emperor, b) embalmed, c) tame, d) sucking pigs, e) sirens, etc. Similarly, current studies of gamification define taxonomies with overlapping categories. For example, badges may be goals, rewards, feedback, progress and challenge and all of these at once.

A bigger problem, however, is that gamification right now is still stuck on low-level design elements. We tend to reshuffle these existing elements, but do not discover new game elements. We also tend to ignore higher abstraction levels, like game design patterns and mechanics (e.g., resource constraints, turns), principles and heuristics (e.g., enduring play, game styles), game models (e.g., challenge, fantasy), game design methods (e.g., playtesting, values). To address these issues, what the field really needs is an empirically grounded, well-operationalised well-formed taxonomy of design elements. This is a massive undertaking, but one that we should start working on now.

Treating motivation as another black box is similarly unhelpful. Motivation is not singular by any means. It can be described as a paradigm in which behaviour is energised and directed not by a single grand cause but by multiple of multilevel and coaching influences. Indeed, gaming motives are multiple, including motives such as curiosity, competence, autonomy, relatedness, meaning, absorption (flow), control beliefs and goal-setting. Different design elements will lead to different motives, while different motives will work with different people. What is needed is the empirical evaluation of what design features map to which motives (or which set of motives). For example, unless you can actually measure that the way you designed a given badge leads to goal-setting, any relationship is just conjecture. This is another massive undertaking.

Furthermore, how motivation is affected by design elements needs to be looked at in context. Motivational function is an appraisal, which can be subconscious or deliberate, and it can be negative or positive. For example, if you get a reward for something that you are already motivated to intrinsically, that reward can actually reduce your motivation. Similarly, feedback can be appraised either as controlling or as informing. The former thwarts autonomy, while the latter supports competence. This suggests that the model should be extended by explicitly considering appraisal as the relationship between design elements and motives, as well as situation, which strongly affects this relationship. Finally, since different motives work with different people, the model also needs to incorporate the person, who may be characterised at trait or state level. As an example, Sebastian cited the fan fiction adverts of FanLib.com, which wrongly targeted male audiences, who cared about competitions, instead of their actual female audience who cared about a supportive community to aid in writing.

Even with all the elements of the model in place, one needs to look at it not like a causal flow, but rather as a systemic-dynamic quality. For instance, adding a clock to chess changes the whole dynamic of the game; it turns into a totally different game. In the case of monopoly, the game is organised such that if you have money, chances are you get more money, while if you are poor, chances are you will have to pay out more. You have a slowly widening poverty gap, which drags on. It is a frustrating game for anyone apart from the one on the winning streak. This emerges from how the game is designed, from the system and the person. This function is a dynamic person-situation relation (in flow theory), which models when people feel optimal: between frustration and boredom. It allows to ask the question whether the activity engages at the right skill level.

Within this model, Sebastian calls, we should not focus on features (e.g., points, badges, etc), but should look at how features in relation to dispositions afford motivational functions. For example, when adding badges to an activity, we need to identify what goals they should be attached to and how attainable these goals are at different skill levels. When adding elements that trigger surprise, we should make sure that the user does not ever know the rule for triggering it. When motivating curiosity, we need to consider what the user knows already and give hints of what is coming. To motivate for meaningfulness, we need to find out why the user cares about something. Thus, we should look at motivational affordance relations of actor dispositions and environment features that render an action or event functionally significant for a specific motive. This is a huge ask.

Luckily, we are living in an era of massive opportunities that can help us in fixing these issues. At any moment, companies are running millions of A/B tests to see how they can optimise user engagement, e.g., how they can get more users to stay longer. Due to the scale of these experiments, companies want to automate these tests, e.g., CreativeAI’s websites that design themselves. However, in order to know what to test in a A/B test and what to automate, you need theory to know what to optimise in order to avoid getting stuck in local optima. So the question is whether we can create a double loop that includes researchers, who can use the data gathered to inform motivational theories and feature mappings, which can then feed back into the ongoing tests. For example, in flow theory, we need to match skill with difficulty to balance between frustration and boredom, but we don’t know what is the slope of this function; there is no empirical evidence. Take crowdsourcing as a casing point: typically, a predetermined set of tasks of unknown difficulty are randomly served to workers, resulting in abysmal retention. So the challenge here may be to order tasks for workers in such a way as to keep them on the optimal path between frustration and boredom. The real challenge is thus to facilitate collaboration between researchers and the companies that run such tests at scale.

In summary, to get out of groundhog day, where we repeat the same studies, we need theory to build systematically on; theory that we construct together. For that, we need better taxonomies and models — all of which will take years to construct and empirically verify. Clearly, this is a massive undertaking, but with a matching massive opportunity if we work together with the commercial sector: we can be at the forefront of design evolution if we loop research into the world’s largest design lab that is running already every day.

The rest of the day included 7 talks and a discussion session.

Matthew Barr and colleagues, Kay Munro and Frank Hopfgartner, from the University of Glasgow, presented a preliminary analysis of a gamified University library system, LibraryTree. The system aims to trigger users’ extrinsic motivation to increase their interaction with the library by harnessing gaming techniques to reward users and by making interactions more fun. LibraryTree allows students to gain points and badges (stamps) for entering the library building, borrowing and returning books, accessing e-resources or sharing a review of an item they read with friends. Based on their analysis of the first six months of LibraryTree’s operation, with 1751 registered users, they highlight significant variations in the manner in which users interact with the system, depending on their contexts (e.g., departments) and levels of expertise. They also noted that in terms of the flow theory, they observed that the path between frustration and boredom is in fact non-linear and users tend to bounce around.

Michael Meder, Till Plumbaum and Sahin Albayrak introduced the audience to an Enterprise Inforboard experiment. Employees can define topics of interests (persisted search query) and the Inboard will continuously search for new information for the user. Similarly to the keynote talk, they too argue that gamification design must be user specific to successfully apply. In order to decide which elements to apply and when, however, they propose the use of data mining and machine learning methods to determine the types of users in a company and to learn what game elements best suits them.

Ioannis Karatassis and Norbert Fuhr explored the idea of enhancing web search literacy by educating search engine users who lack appropriate strategies to find relevant results efficiently. In particular, they embarked on gamifying WebSAIL, an ongoing project that focuses on long-lasting enhancements where acquired skills remain sustainable. The gamified system supports different game modes (quiz, search hunt and query tuning), different difficulty levels and goals and incorporates a wide range of game elements. In a study with 15 participants, the gamified application achieved during a usability evaluation a SUS score around 90.

Laura Guillot, Quentin Bragard, Ross Smith, Dan Bean and Anthony Ventresque looked at how they could improve Skype translation for online meetings. Their system collects translations during online meetings and asks the crowd to improve them in context. Users play to earn points and rewards for proposing and voting for the most accurate translations in context. The votes are weighted by the user’s expertise. Thanks to the statistics, the players know that they are part of a real community and they play for a real goal.

Yuan Jin, Mark J. Carman and Lexing Xie investigated the effect of competition settings with real-time performance feedback on the accuracy of crowd workers’ relevance labels. They found that providing only basic feedback in the form of a leaderboard has little effect on the workers’ relevance judging accuracy. However, a leaderboard appears necessary in order to get the maximum effect when bonuses are paid to the best performing workers.

Andreas Leibetseder and Mathias Lux set the audience’s heart racing by gamifying fitness. They proposed a novel approach to improve motivation for exergaming, which integrates physical exercise into popular games. In order to evaluate this concept, they conducted a comparative study examining 67 participants’ reactions to testing an ergometer controlled casual game as well as a modified game. Results indicate strong tendencies of players preferring the newly introduced approach over the casual fitness game.

Finally, Giorgio Maria Di Nunzio, Maria Maistro and Daniel Zilio presented their gamification of machine learning techniques – with a twist – where humans are tasked with finding the optimal regression line in a classification task. The game consists of separating two sets of coloured points on a two-dimensional plane by means of a straight line that users can rotate or move up or down. They found that the classification results of the game were very high compared to the small amount of labelled objects. On average, the players could beat the ‘goal’ score more easily than expected, which means that the probabilistic classifiers can be trained/validated with just 25% of the original dataset and obtain in many cases even better results than a cross-validation on the whole dataset. Their paper was awarded the best paper award based on the reviewers’ scores.

The discussion session touched on questions whether gamification needed a wider audience, beyond information retrieval, and if it was perhaps a better fit with machine learning. A consensus among the participants was that, should the workshop run again, it should host more games design experts. There are plenty of applications and areas of research that can be enhanced with theories from gamification such as understanding the motivation, behaviour and psychological state of the user in interactive session based IR. The life logging community has ramification as one of their pillars. Recommender systems and bonus programs, citizen science games and anything that requires long-term engagement can all benefit from gamification.

However, there were concerns raised regarding to what depth crowdsourcing experiment designers, for example, should be experts in game design in order to run an experiment: it could be a lot of extra work while it is not the primary research focus. Researchers simply may not have the resources to make a good design. The issue is that there are no platforms available right now. There are no TensorFlows of the gamification world.

About Gabriella Kazai

Gabriella Kazai is VP of Data Science at Lumi, the startup company behind the Lumi Social News app which provides personalised recommendations of crowd curated content from across the world's media and social networks, see android.lumi.do. Prior to that, Gabriella worked as a research consultant at Microsoft Bing and at Microsoft Research. Her research interests include recommender systems, machine learning, IR, crowdsourcing, gamification, data mining, social networks and PIM, with influences from HCI. She holds a PhD in IR from Queen Mary University of London. She published over 90 research papers and organised several workshops (e.g., BooksOnline 2008-2012, GamifIR 2014-2015) and IR conferences (ICTIR 2009, ECIR 2015). She is one of the founders and organisers of the INEX Book Track since 2007 and the TREC Crowdsourcing track 2011-2013. She is also co-organiser of the News IR Workshop.