Responsible editor:

Jodi Schneider

Submission Type:

Research Paper

Abstract:

The mobile ecosystem is dramatically growing towards an unprecedented scale, with an extremely crowded market and fierce competition among app developers. Today, keeping users engaged with a mobile app is key for its success since users can remain active consumers of services and/or producers of new contents. However, users may abandon a mobile app at any time due to various reasons, e.g., the success of competing apps, decrease of interest in the provided services, etc. In this context, predicting when a user may get disengaged from an app is an invaluable resource for developers, creating the opportunity to apply intervention strategies aiming at recovering from disengagement (e.g., sending push notifications with new contents). In this study, we aim at providing evidence that predicting when mobile app users get disengaged is possible with a good level of accuracy. Specifically, we propose, apply, and evaluate a framework to model and predict User Engagement (UE) in mobile applications via different numerical models. The proposed framework is composed of an optimized agglomerative hierarchical clustering model coupled to (i) a Cox proportional hazards, (ii) a negative binomial, (iii) a random forest, and (iv) a boosted-tree model. The proposed framework is empirically validated by means of a year-long observational dataset collected from a real deployment of a waste recycling app. Our results show that in this context the optimized clustering model classifies users adequately and improves UE predictability for all numerical models. Also, the highest levels of prediction accuracy and robustness are
obtained by applying either the random forest classifier or the boosted-tree algorithm.

Decision:

Overall Impression: GoodSuggested Decision: UndecidedTechnical Quality of the paper: GoodPresentation: GoodReviewer`s confidence: MediumSignificance: Moderate significanceBackground: ReasonableNovelty: Unable to judgeData availability: All used and produced data (if any) are FAIR and openly available in established data repositoriesLength of the manuscript: The length of this manuscript is about right

Summary of paper in a few sentences:

The paper tests four models (Cox Proportional Hazard Model, Negative Binomial Model, Random Forest Model, XRBoost Model) to predict mobile phone applications user engagement. The goal is to provide evidence that it is possible to predict user engagement fairly accurately, though the wider goal is to facilitate intervention to re-engage users or increase user engagement. The models are tested using data from user engagement with a waste recycling app. The paper shows that random forest and XGBoost models are best suited to make accurate predictions in terms of identifying engaged and disengaged users. The negative binomial model is the least accurate model in terms of predicting the number of user activities before disengagement.

Reasons to accept:

The paper provides evidence that user engagement can be predicted, albeit I would like to have a more explicit discussion to what extent this evidence has been missing so far, given this seems to be the main novelty of the paper.
The paper deploys and compares four different models to make predictions on user engagement, each useful in its own right and it is indeed very insightful to see how these models can be used with these type of data. The authors do admit that the comparison across these four models is not entirely justified, given their different natures, however, this need further acknowledgement in the discussion section. In fact, only RF and XRBoost models are really reasonably comparable, since both aim at the classifying users into engaged/disengaged. The other two models have very different predictive objectives and setups and hence comparing them to each other and to the two classification models is highly problematic in my eyes.
The authors show the value of clustering users prior to the modelling to obtain better prediction results. However, I would like to know what variables were used for clustering and what the clusters actually mean.

Reasons to reject:

I find the framing of the paper problematic in term of the wider goal of this research. Why should the users be prompted to use an app, that they probably consider (temporary) irrelevant, why should we try to make people spend even more time sticking to their phones? I find these explicitly stated goals highly problematic. It is quite striking that when the authors list all kind of reasons why people choose to disengage with an app (p.2|), the most obvious reason, that the app has lost its relevance (at least temporary), is not even listed. Also, I see how this investigation is of interest to the industry, but, this is supposed to be an academic paper and I would like to see how this is relevant for science. The authors write on p.2 "...provide a framework for modeling and predicting UE, which can be further extended or used in other scientific studies", this needs to be significantly expanded.
I am not convinced the four models should be compared at all (see comments above). I think the authors should rather treat the models in their own rights, given they serve different modelling/prediction purposes.
The clustering needs further explanation and interpretation (see comments above).

Further comments:

Please use gender-neutral (possessive) pronouns, e.g. on page 2 instead of "...better suited for his own mobile app", write "...better suited for their own mobile app" or on page 5 instead of "(2) the time of her last event within the app" (which by the way sounds a bit awkward anyway), write "(2) the time of their last event within the app" (as noted, you may want to rephrase the entire statement).
Figure 6, what you claim to be blue (predicted events) appears as black (at least on my screen).

Overall Impression: WeakSuggested Decision: UndecidedTechnical Quality of the paper: WeakPresentation: WeakReviewer`s confidence: MediumSignificance: High significanceBackground: Incomplete or inappropriateNovelty: Limited noveltyData availability: All used and produced data (if any) are FAIR and openly available in established data repositoriesLength of the manuscript: The length of this manuscript is about right

Summary of paper in a few sentences:

This submission presents a framework to model and to predict user engagement with mobile applications. The framework is evaluated by using a data set of app usage of one particular app focusing on waste recycling. With the successful evaluation, the authors aim to provide evidence that it is possible to predict when users of a mobile application will get disengaged.

Reasons to accept:

The focused topic of modeling and predicting user engagement for mobile applications is timely and relevant for the research communities in Data Science and Human-Computer Interaction. Overall, the presented approach seems to be novel and well-suited. Furthermore, also the results of the evaluation are promising.

Reasons to reject:

I have strong doubts regarding the used data set and features. The used waste recycling app is described only briefly. The authors do not argue, why this is a common mobile application. I would recommend discussing this with consideration of the results presented by Müller et al. [1]. I would question that it is common for mobile applications that gamification aspects (here earning points) are directly connected to providing monetary benefits. Here, it is particularly interesting that the granted points can only be used at local shops. Thereby user’s location becomes an obvious feature for disengagement. Additionally, using the zip code and the geolocation provides only redundant information. In general, the list of features and calculated variables is fuzzy. The authors claim to use 7 features but present only 6 in a list. Also, it reminds unclear how they combined the features to the 122 variables.

The authors do not describe if the application triggered any notifications. However, Sahami Shirazi et al. describe notifications as an essential element for engaging with mobile applications [2]. Hence, I wonder why the authors did not use the number of notifications or the reaction to notifications as a feature. To be able to understand user engagement or disengagement with the waste recycling app, it would be helpful, if the authors would also publish the application or provide at least a reference to the application.

While the authors motivate their work in the introduction very general, also looking on specific application domains such as health (reference [9] in the submission), the authors discuss the limitation of the used data set only briefly at the end of the paper.

As described the used features and the data set look more specific than general to me. Hence, the submission would be more substantial if the authors would make less general claims and focus particularly on comparable mobile applications. Furthermore, publishing not only the data set but also the application or providing a reference to the application would improve the validity.

2 Comments

This paper studies the user engagement in mobile apps. This is an interesting problem with important practical implications. The paper investigates the predictability of when mobile app users get disengaged with apps and shows that it can achieves the engagement prediction with a good level of accuracy. It applies different prediction models and also show clustering further facilitate the prediction. The paper is interesting, but there are some limits.

First, it only shows the predictability while doesnot show much details about the prediction itself, namely, what features would lead to the prediction. As a result, the practical implication would be limited. Also the features used in the perdition might not provide useful indication of engagement management without further investigation, such as significance test, etc.

Second, the prediction just applies some standard models without much technical novelties (also given the practical implication can be limited given the current status of the paper (ie, lacking of details about the prediction model); it would be helpful also give more details of technical barrier of the problem and solutions.

Third, more features would be helpful, in particular some features can explain use engagement such as version updates, similar apps in the market. App usage feature might just related to what to predict in this paper.