Abstract

Background

To ensure the output quality, current crowdsourcing systems highly rely on redundancy of answers provided by multiple workers with varying expertise, however massive redundancy is very expensive and time-consuming. Task recommendation can help requesters to receive good quality output quicker as well as help workers to find their right tasks faster. To reduce the cost, a number of previous works adopted active learning in crowdsourcing systems for quality assurance. Active learning is a learning approach to achieve certain accuracy with a very low cost. However, previous works do not consider the varying expertise of workers for various task categories in real crowdsourcing scenarios; and they do not consider new workers who are not willing to work on a large amount of tasks before having a list of preferred tasks recommended. In this paper, we propose ActivePMFv2, Probabilistic Matrix Factorization with Active Learning (version 2), on a task recommendation framework called TaskRec to recommend tasks to workers in crowdsourcing systems for quality assurance. By assigning the most uncertain task for new workers to work on, this paper identifies a flaw in our previous ActivePMFv1, Probabilistic Matrix Factorization with Active Learning (version 1). Therefore, ActivePMFv2 can give new workers a list of preferred tasks recommended faster than that of ActivePMFv1. Our factor analysis model considers not only worker task selection preference, but also worker performance history. It actively selects the most uncertain task for the most reliable workers to work on to retrain the classification model. Moreover, we propose a generic online-updating method for learning the model, ActivePMFv2. The larger the profile of a worker (or task) is, the less important is retraining its profile on each new work done. In case of the worker (or task) having large profile, our online-updating algorithm retrains the whole feature vector of the worker (or task) and keeps all other entries in the matrix fixed. Our online-updating algorithm runs batch update to reduce the running time of model update.

Results

Complexity analysis shows that our model is efficient and is scalable to large datasets. Based on experiments on real-world datasets, the result shows that the MAE results and RMSE results of our proposed ActivePMFv2 are improved up to 29 % and 35 % respectively comparing with ActivePMFv1, where ActivePMFv1 outperforms the PMF with other active learning approaches significantly as shown in previous work. Experiment results show that our online-updating algorithm is accurate in approximating to a full retrain of the learning model while the average runtime of model update for each work done is reduced by more than 80 % (decreases from a few minutes to several seconds).

Conclusions

To the best of our knowledge, we are the first one to use PMF, active learning and dynamic model update to recommend tasks for quality assurance in crowdsourcing systems for real scenarios.

Keywords

Background

Crowdsourcing is an idea of outsourcing a task to a large group of networked people in the form of an open call to reduce the production cost [1, 2]. In recent years, crowdsourcing systems attract much attentions at present [3, 4]. Some examples of crowdsourcing systems are Amazon Mechanical Turk (or MTurk) [5], CrowdFlower [6], Taskcn [7] and TopCoder [8]. In a crowdsourcing system, the output quality of a completed task in a crowdsourcing system is “the extent to which the provided outcome fulfills the requirements of the requester” [9]. For quality assurance, a requester has to verify the quality of every answer submitted by workers, and it is very time-consuming. Alternatively, requesters highly rely on redundancy of answers provided by multiple workers with varying expertise, but massive redundancy is very expensive and time-consuming. “If we ask 10 workers to complete the same task, then the cost of crowdsourcing solutions tends to be comparable to the cost of in-house solutions” [10]. Therefore, it is important to investigate on how to support task requesters to verify correct answers on crowdsourcing platforms easily and effectively. On the other hand, it is not efficient that the amount of time for a worker spent on selecting a task is comparable with that spent on working on a task, but the monetary reward of a task is just a small amount. To address this problem, task recommendation can help to provide a list of preferred tasks to workers in crowdsourcing systems. However, new workers do not want to work on a large number of tasks or wait for a long time before having a list of preferred tasks recommended. Therefore, it is important to help workers to find their right tasks as quick as possible and minimize the number of task assignments to achieve a target output quality [11]. The worker performance history makes it possible to mine workers’ preference on tasks and to provide an indication of worker quality on tasks. Based on worker performance history, an active learning approach on task recommendation can be used to help requesters to receive good quality output quicker with lower cost, thus achieve quality assurance in crowdsourcing systems. Moreover, by assigning the most uncertain task for new workers to work on, it not only helps new workers having a list of preferred tasks recommended faster, but also improves the output quality of crowdsourcing systems.

Task recommendation can help requesters to receive good quality output quicker as well as help workers to find their right tasks faster. Probabilistic Matrix Factorization (PMF) [12] is the state-of-the-art approach for recommendation systems. A factorization model has to be trained and learned before the model can be applied for prediction. In real-world applications, the performance of a factorization model is highly affected by how the model is updated, and thus dynamic updating a model is very important [13]. When updating a worker’s profile, the profile will not change much if the worker having large profile; while the profile will have great change if the worker having small profile.

Active learning is a learning approach to improve the prediction accuracy with a low cost. By performing active learning in a task recommendation model, it can guarantee the accuracy of recommendations with a very low cost, but it still needs to consider the minimizaton of the user waiting time. The tasks recommended to the new workers have to carefully selected, because new workers are not willing to work on a lot of tasks before having their preferred tasks recommended. Therefore, systems that provide recommendations in large user waiting times are not suitable for real-world applications [14]. Moreover, it does not make sense to retrain the model from scratch whenever a worker of large profile completes a task, because the performance improvement by retraining the model in the case is tiny but the cost of retraing model is high. Furthermore, when a large number of workers are working in the crowdsourcing system at the same period of time, the computational complexity is very high if the model is retrained after each worker completes a task. Batch update provides a method of reducing both user waiting time and computational complexity.

Our contributions are as follows:

First, we propose a way for quality assurance by performing ActivePMFv2, Probabilistic Matrix Factorization with Active Learning (version 2), for task recommendation in crowdsourcing systems, where ActivePMF is an active learning approach on factor analysis based on probabilistic matrix factorization, such that the worker latent feature space, task latent feature space and task category latent feature space are learned. ActivePMF considers the varying expertise of workers for different tasks in real crowdsourcing scenarios. The most informative task and the most skillful worker are selected to learn the factor analysis model.

Second, we first assign all new tasks to the most reliable workers based on the task categories in our proposed ActivePMFv2, so new workers can receive a list of preferred tasks recommended faster than that of ActivePMFv1 [15]. Our proposed ActivePMFv2 also has better output prediction quality than that of ActivePMFv1.

Third, we propose a generic online-updating method for learning a factor analysis model, ActivePMFv2. In our proposed online-updating approach, our online-updating algorithm applies on a learned PMF model without having retrain the whole model. The proposed update methods are generic and appliable for all PMF models.

Fourth, we demonstrate the performance of our proposed ActivePMFv2 by using the real word dataset. The experimental results show that ActivePMFv2 outperforms ActivePMFv1 by 29 % in the MAE results and 35 % in RMSE results, where ActivePMFv1 outperforms PMF with various active learning approaches significantly (PMF is the state-of-the-art approach for recommendation systems).

Fifth, we demonstrate the performance of our online-updating algorithm by using the real world dataset. The experiment results show that the prediction of online-updating ActivePMF on TaskRec model approximates to that of a full retrain of ActivePMF on TaskRec model while the running time of online-updating algorithm is significantly lower than that of a full retrain of the model. By using online-updating algorithm, the average runtime of model update for each work done is reduced by more than 80 % (decreases from a few minutes to several seconds).

Finally, complexity analysis shows that our model is efficient and is scalable to large datasets.

Related work

Crowdsourcing systems

Crowdsourcing is outsourcing a task to a large group of networked people in the form of an open call to reduce the production cost. A crowdsourcing process involves operations of both requesters and workers. A requester submits a task request; a worker selects and completes a task; and the requester only pays the worker for the successful completion of the task. Task recommendation in crowdsourcing is important because of the following reasons:

Motivate workers of diverse background to work on crowdsourcing tasks in long run. Currently, on crowdsourcing sites, most workers only provide moderate contributions [16] and there is a significant population of young and well-educated Indian workers [17]. It can attract more workers to contribute their efforts in long run if a worker find a suitable task on a crowdsourcing site easily.

Improve the quality of work. Workers perform better if they are familiar with the tasks. Chilton et al. [18] showed that task workers only browsed the first few pages on crowdsourcing sites when searching for tasks. The task list for a worker of Amazon MTurk site is usually displayed on hundreds of pages. A worker selects a task from the list of available tasks sorted by a specified feature of tasks such as task creation date and reward amount. When the tasks posted on the first few pages are not suitable for a worker, the worker might choose a task that he does not familiar with and try to complete it to earn the rewards; otherwise, the worker does not select any task. Working with a unfamiliar task might decrease the quality of work.

Recommendation systems

Broadly speaking, recommendation systems are based on either content filtering approach or collaborative filtering approach. The content filtering approach creates a profile for each user or product, for example, a movie, to characterize its nature. The profiles of users and products allow programs to associate users with matching products. The advantage is that it can address the system’s new products and users. However, the profile information might not be available or easy to collect. On the other hand, the collaborative filtering approach relies only on past user behavior. This approach analyzes relationships between users and interdependencies among products to identify new user-item associations. It is generally more accurate than content filtering approach. However, collaborative filtering cannot address the system’s new products and users [19], which is the cold-start problem.

To address the cold-start problem, latent factor models are an alternative approach that can approximate the ratings by characterizing both users and items on a number of factors inferred from the ratings patterns. “Some of the most successful realizations of latent factor models are based on matrix factorization.” [19] Matrix factorization has a lot of applications [20, 21]. Although matrix factorization can solve the cold-start problem, it is not scalable. Probabilistic matrix factorization (PMF) model [22] can scale linearly with the number of observations, and performs very well on large, sparse, and imbalanced datasets.

Recently, several probabilistic matrix factorization methods [23] have been proposed for collaborative filtering approach in recommendation systems. These methods focus on using low-rank approximations to model the user-item rating matrix for making further predictions. The premise behind a low-dimensional factor model is that there is only a small number of factors influencing preferences, and that a user’s preference vector is determined by how each factor applies to that user. The above approaches are used for user recommendation in social tagging systems.

Task recommendation in crowdsourcing systems

A recommendation system can improve the performance of crowdsourcing systems by providing task requesters some output quality controls based on a number of parameters, such as task requirements, task properties, worker interests, worker incentives, and costs [24]. Based on collaborative filtering, Organisciak et al. [25] proposed two approaches for capturing personal preferences in personalized item recommendation in crowdsourcing systems; and they are taste-matching and taste-grokking. Taste-matching uses workers’ taste to infer the requester’s taste where workers and the requester have similar tastes, while taste-grokking uses workers’ explicit prediction on the requester’s taste. Both taste-matching and taste-grokking have better performance than the use of generic workers. Later, Organisciak et al. [26] demonstrated the performance of personalized crowdsourcing in a complexer environment by carrying out case studies on personalized text highlighting in film reviews. The results show that both approaches have better performance than a non-personalized baseline. Besides, the taste-grokking approach performs well in simpler tasks and the taste-matching approach performs well with larger crowds and tasks with latent decision-making variables. Ambati et al. [27] proposed classification based task recommendation approach to recommend tasks to users based on implicit modeling of skills and interests. However, these approaches can not solve cold-start problem. Besides, task recommendation is much difficult than product recommendation, and workers do not have to give ratings to tasks to indicate the extent of their favor of each task. A crowdsourcing system needs some signals indicating types of available tasks, and the number of tasks workers select and complete [28].

Active learning

Active learning is a learning approach to achieve certain accuracy with a very low cost. Broadly speaking, active learning systems are based on either stream-based approach [29] or pool-based approach [30]. The stream-based approach considers one unlabeled instance each time, and decides whether to query its label or ignore it. This approach is useful when unlabeled instance is continuously available but cannot be stored easily, such as sensor data. However, as the stream-based approach relies on a real underlying input distribution, it is difficult to decide whether to query the label of an instance or ignore it at its arrival time. On the other hand, the pool-based approach ranks all unlabeled instances in order of informativeness, and queries the label for the most informative unlabeled instance in the pool. The advantage is that large amount of unlabeled data are available in many domains at present, this approach is very important. However, it is difficult to find a good way to choose good queries from the pool. In both stream-based approach and pool-based approach, a query selection strategy is required to achieve high accuracy with as low labeling cost as possible.

Several query selection strategies are commonly used in active learning systems. For example, uncertainty sampling selects instances, which the current model is the most uncertain about, to query. There are many ways to measure the uncertainty, such as smallest margin [31], least confidence [32] and maximum entropy [33]. Another example of query selection strategies is query by committee [34]. This approach maintains a committee of independently trained classification models, and queries the instance for which the committee models disagree the most. Among all query selection strategies, uncertainty sampling is one of the simplest and widely used strategies in active learning systems [35].

Our motivation is the observation of the increase of difficulty for requesters to obtain good output with low cost. Currently, active learning approach has been applied on a number of quality assurance methods in crowdsourcing systems [35–37] because active learning can achieve certain accuracy by using fewer annotations even in noisy annotation scenarios. Laws et al. [36] demonstrated that actively selecting instances for label query can achieve performance gains in natural language processing tasks. However, they did not consider actively selecting annotators. To further improve the output quality, Yan et al. [35] proposed to use uncertainty sampling to select the most uncertain instance to query, and also uses an optimization formulation to choose multiple confident workers to query from. However, it is costly to pay for multiple labelers. To characterize the strength of each worker and improve the compentency of weak workers, Fang et al. [37] proposed a Self-Taught Active Learning paradigm, where a weak worker can learn complementary knowledge from a strong worker. Each labeler has a knowledge set (a set of confidence scores), which is used for worker selection. By using the knowledge set of the most reliable labeler to replace that of the most unreliable labeler, the most reliable labeler can teach the unreliable labeler. However, the learning curves of different labelers vary in real scenarios. Later, Fang and Zhu [38] proposed to use diversity density to characterize the oracle’s uncertain knowledge. However, an oracle does not exist in real-world applications.

In recent years, a number of research works [39–42] proposed recommendation systems based on a Probabilistic Matrix Factorization (PMF) model to improve the output quality in crowdsourcing systems, where Probabilistic Matrix Factorization (PMF) is the state-of-the-art approach for recommendation systems. Jung and Lease [39] proposed to use a PMF model to infer unobserved labels to reduce the bias of the existing crowdsourced labels, thus improve the quality of labels. Later, Jung [40] proposed to use a PMF model to improve the quality of crowdsourcing tasks. Experimental results proved that the strength of PMF over Singular Value Decomposition (SVD) and baseline methods. However, it is not suitable for a huge number of tasks on crowdsourcing systems in reality. Yuen et al. [41, 42] considered various task categories in real scenarios in crowdsourcing systems and proposed a PMF model for task recommendation in crowdsourcing systems. They proved that considering task categories in PMF can improve the performance. However, it does not consider to reduce the labeling cost by applying active learning approach. Later, Yuen et al. [15] proposed Probabilistic Matrix Factorization with Active Learning (version 1) for task recommendation systems. The model outperforms the PMF model with other active learning approaches, but new workers have to wait for a long time before having a list of perferred tasks recommended due to lack of worker performance history for all new workers.

Task recommendation for new workers in crowdsourcing systems

In crowdsourcing systems, workers prefer to have a list of recommended tasks, but they are not willing to work on a large number of tasks before having a list of preferred tasks recommended. Since new workers have not worked on a lot of tasks yet, it is difficult for a recommendation system to make a better recommendation for new workers due to the small working profiles of new workers. When performing active learning in recommendation systems, besides accuracy of recommendations and minimization of cost, it is also important for new workers to have a list of preferred tasks recommended as soon as they start working in a crowdsourcing system.

Dynamic-updating crowdsourcing systems for real-world scenarios

Some previous works proposed various ways to improve the performance of recommendation systems for real-world scenarios [13, 14]. Rendle et al. [13] proposed an online-updating algorithm for three kernal matrix factorization models, they are linear, logistic and linear non-negative matrix factorization model. They demonstrated that the output quality of their proposed online-updating algorithm approximates to that of fully retraining the models. However, the three kernal matrix factorization models are designed for user-item matrices. Besides, the user preference on item category, which is necessary for real-world applications, is not considered in the models. Karimi et al. [14] proposed an active learning method for aspect model in recommendation systems. They observed that “users are not willing to provide information for a large amount of items, thus the quality of recommendations is affected specially for new users” [14], and a full retrain of an aspect model needs a long time especially for a system of large number of existing users. In their online-updating method, the aspect model is updated to learn user latent factors for new users only and not the other users. Experimental results show that the user waiting time of their method is significantly less than that of bayesian method, but the prediction accuracy of their method is not always better than that of bayesian method.

Our motivation

Our motivation is the observation of the increase of difficulty for workers to find their preferred tasks [18, 43, 44], the increase of demand on task recommendation in crowdsourcing systems [45–47], and no previous works on task recommendation model for crowdsourcing systems that considers dynamic-updating for reducing the user waiting time on model update. To achieve certain output quality with a very low cost, we propose ActivePMFv2, Probabilistic Matrix Factorization with Active Learning (version 2), for task recommendation in crowdsourcing systems to actively select the most uncertain tasks and the most reliable workers for retraining the classification model. To reduce the user waiting time on the learning model update, we propose a generic online-updating method for learning the model, ActivePMFv2.

Task recommendation framework

Our task recommendation framework (TaskRec) is based on matrix factorization method, to perform factor analysis to learn the worker latent feature, the task latent feature and the task category latent feature. To reduce the labeling cost and guarantee the output quality, Probabilistic Matrix Factorization with Active Learning (version 2) selects the most informative task to be learned and selects the best worker to query from.

The problem we study in this paper is how to effectively predict the missing values in the worker-task performance matrix so as to select the most informative task for the best worker to query from to achieve high accuracy with as few instances as possible. We define the problem of quality assurance in crowdsourcing systems as follows:

Definition 1

Quality assurance problem: Given a set of workers WS = \(\{ w_{i} \}^{m}_{i=1}\), a set of tasks VS= \(\{ v_{j} \}^{n}_{j=1}\), a set of ratings R = {rij} associated between worker wi and task vj where \(r_{ij} \in \mathbb {R}^{M\times N}\) and only certain elements of R are initially known. The binary matrix I = {Iij} of the same shape as R represents the known points, so that Iij is 1 if rij is observed and 0 otherwise. The set of (i, j) indexes where Iij = 0 is denoted by PS. Predict the set of the unknown elements of R = {rij} where (i,j)∈PS. The aim of ActivePMF is to query the most informative tasks selected from the set of the unknown elements of R = {rij} where (i,j)∈PS, and to query from the most reliable workers.

To facilitate our discussions, Table 1 defines basic terms and notations used throughout this paper.

Table 1

Basic notations throughout this paper

Notation

Description

WS= \(\{ w_{i} \}^{m}_{i=1}\)

WS is the set of workers, wi is the i-th worker, m is the total number of workers

VS= \(\{ v_{j} \}^{n}_{j=1}\)

VS is the set of tasks, vj is the j-th task, n is the total number of tasks

CS= \(\{ c_{k} \}^{o}_{k=1}\)

CS is the set of task categories, ck is the k-th task category, o is the total number

of task categories

\(l \in \mathbb {R}\)

l is the number of dimensions of latent feature space

\(W \in \mathbb {R}^{l \times m}\)

W is the worker latent feature matrix

\(V \in \mathbb {R}^{l \times n}\)

V is the task latent feature matrix

\(C \in \mathbb {R}^{l \times o}\)

C is the task category latent feature matrix

R = {rij},

R is the worker-task preferring matrix, rij is the extent of the favor of task vj

\(R \in \mathbb {R}^{m \times n}\)

for worker wi

U = {uik},

U is the worker-category preferring matrix, uik is the extent of worker wi’s

\(U \in \mathbb {R}^{m \times o}\)

preference for category ck

D = {djk},

D is the task-category grouping matrix, djk indicates the task category ck that

\(D \in \mathbb {R}^{n \times o}\)

task vj belongs to

(i,j)∈PS

PS is the set of indexes where the rating rij is unknown

N(x|μ,σ2)

Probability density function of the Gaussian distribution with mean μ and variance σ2

Graphical Model for TaskRec. This is the graphical model for TaskRec, a task recommendation framework in crowdsourcing systems

By using a worker-task preferring matrix, we can measure the extend the worker prefer to work the task and provide output that accepted by requesters. Unlike traditional recommendation systems, workers do not have to give ratings to tasks to indicate the extent of their favor of each task. To have ratings on tasks, we transform workers’ behaviors into values as follows:

Worker Behavior

Value

Worker’s work done is accepted by requester.

→

5

Worker’s work done is rejected by requester.

→

4

Worker completes a task and submits the work done.

→

3

Worker selects a task to work on but not complete it.

→

2

Worker browses the detailed information of a task.

→

1

Worker does not browse the detailed information of a task.

→

0

In some cases, the ratings based on value transformation of worker behavior would be inaccurate on reflecting workers’ task preference. For example, a worker’s work done is being accepted, but he might not like the task very much.

Worker-task preferring matrix factorization

We have m workers, n tasks. The worker-task preferring matrix is denoted as R, the element rij in R means the extent of the favor of task vj for worker wi, where values of rij are within the range [0, 1]. Without loss of generality, we first map the ratings that inferred from worker behavior 1,..., 5 to the interval [0, 1] using the function f(x)=x/5. Hence, we are given a partially observed worker-task preferring matrix, R, with m workers and n tasks.

To learn the workers’ preference on the tasks, we employ matrix factorization, more specifically, Probabilistic Matrix Factorization (PMF) [12], to recover the worker-task preferring matrix. Given the partial observed matrix R, we aim at decomposing the matrix R into two l-dimensional low-rank feature matrices, W and V, where \(W \in \mathbb {R}^{l \times m}\) is the latent feature matrix for workers with column vector Wi, and \(V \in \mathbb {R}^{l \times n}\) is the latent feature matrix for tasks with column vector Vj.

To learn the matrices, a Gaussian distribution on the residual of the observed ratings is assumed as [12], and it is defined in Eq. (1):

where N(x|μ,σ2) is the probability density function of the Gaussian distribution with mean μ and variance σ2, and \(I^{R}_{ij}\) is the indicator function that is equal to 1 if the entry rij is observed and equal to 0 otherwise. The Gaussian distribution model can make predictions outside of the range of valid values. The function g(x) is the logistic function g(x)=1/(1+exp(−x)), which makes it possible to bound the range of \({W^{T}_{i}} V_{j}\) within the range [0,1]. Similar to [48], to avoid overfitting, zero-mean spherical Gaussian priors are also placed on the worker and task feature matrices, which are defined in Eq. (2):

Worker-category preferring matrix factorization

We have m workers and o task categories. The worker-category preferring matrix is denoted as U, where the element uik in U represents the extent of worker wi’s preference for task category ck. Workers’ performance histories indicate workers’ preference for task categories, so the meaning of uik can be interpreted as whether the worker wi has completed a task of the category ck where the task is accepted (a binary representation), or how strong the worker wi’s preference is for the task category ck (a real value representation). We represent uik as shown in Eq. (4):

$$\begin{array}{*{20}l} u_{ik} = g(f(w_{i}, c_{k})), \end{array} $$

(4)

where g(.) is the logistic function, and f(wi,ck) represents the number of times worker wi completes a task of the category ck where the task is accepted.

The idea of worker-category preferring matrix factorization is to derive two low-rank l-dimensional matrices W and C, where \(W \in \mathbb {R}^{l \times m}\) and \(C \in \mathbb {R}^{l \times o}\) are the latent feature matrices for workers and task categories, respectively. The column vectors Wi and Ck representing the l-dimensional worker-specific and category-specific latent feature vectors of worker wi and category ck, respectively. We can define the conditional distributions over the observed worker-category preferring matrix in Eq. (5):

where N(x|μ,σ2) is the probability density function of the Gaussian distribution with mean μ and variance σ2, and \(I^{U}_{ik}\) is the indicator function that is equal to 1 if worker wi has at least one completed task of the category ck being accepted and equal to 0 otherwise.

To avoid overfitting, zero-mean spherical Gaussian priors are placed on the worker and the category latent feature matrices, which are defined in Eq. (6):

Task-category grouping matrix factorization

We have n tasks and o task categories. The task-category grouping matrix is denoted as D, where the element djk in D shows the category ck that task vj belongs to. The meaning of djk can be interpreted as whether the task vj belongs to the category ck (a binary representation). We represent djk as shown in Eq. (8):

$$\begin{array}{*{20}l} d_{jk} = f(v_{j}, c_{k}), \end{array} $$

(8)

where f(vj,ck) is an indicator variable with the value of 1 if the task vj belongs to the category ck, and 0 otherwise.

The idea of task-category grouping matrix factorization is to derive two low-rank l-dimensional matrices V and C, where \(V \in \mathbb {R}^{l \times n}\) and \(C \in \mathbb {R}^{l \times o}\) are the latent feature matrices for tasks and task categories, respectively. The column vectors Vj and Ck representing the l-dimensional task-specific and category-specific latent feature vectors of task vj and category ck, respectively. We can define the conditional distributions over the observed task-category grouping matrix in Eq. (9):

where N(x|μ,σ2) is the probability density function of the Gaussian distribution with mean μ and variance σ2, and \(I^{D}_{jk}\) is the indicator function that is equal to 1 if the entry djk is observed and equal to 0 otherwise.

To avoid overfitting, zero-mean spherical Gaussian priors are placed on the task and the category latent feature matrices, which are defined in Eq. (10):

where \(\mathcal {C}\) is a constant independent of the parameters. We can see the Eq. (12) is an unconstrained optimization problem, and maximizing the log-posterior distributions with fixed hyper parameters is equivalent to minimizing the sum-of-squared-errors objective function with quadratic regularized terms in Eq. (13):

where g′(.) is the first-order derivative of the logistic function. To reduce the model complexity, we set θW=θV=θC in our experiments. The training time for our model scales linearly with the number of observations.

We first query all new tasks, which have not been tried by anyone, from the most reliable worker. Then, we query the most uncertain task from all new workers. Next, we query the most uncertain task, and select the most reliable worker for the task to query from. Our proposed ActivePMFv2 is presented in Algorithm 1.

Random new task selection for reliable worker

To learn the most accurate classifier with the least number of work done, we first query all new tasks, as given in Eq. (17), and select the most reliable worker in the task category to query from, as given in Eq. (18).

In Algorithm 1, Step 6 represents the process of new task selection for most reliable worker in the category.

Uncertainty sampling for task selection for randomly selected new worker

The algorithm assumes a particular active learning heuristic specified as an input, and we adopt uncertainty-sampling [30] using the Maximum Difference between predicted rate and observed rate as in Eq. (19) to choose the most uncertain task, that requires minimization of uncertainty. To let new workers having a list of preferred tasks recommended but not having to work on a large amount of tasks beforehand, we randomly select a new worker (if any) to query from, as given in Eq. (20).

In Algorithm 1, Step 8 represents the process of most uncertain task selection for new worker.

Uncertainty sampling for task selection for reliable worker

The algorithm assumes a particular active learning heuristic specified as an input, and we adopt uncertainty-sampling [30] using the Maximum Difference between predicted rate and observed rate as in Eq. (21) to choose the most uncertain task, that requires minimization of uncertainty. To select the most reliable worker for the most uncertain task, we select the worker with the maximum worker-category perferring score where the category that the task belongs to as in Eq. (22).

In Algorithm 1, Step 9 represents the most uncertain task selection for the most reliable worker.

After annotation, the selected task is removed from the unlabeled data set. Next, the selected task and its rate are added to the set of labeled dataset. The model is then retrained on the labeled tasks and the informativeness of the remaining tasks in the unlabeled data set is re-evaluated.

Online-updating on ActivePMFv2 - online-update on active learning for concurrent selection of task and worker

In this section, we present our online-updating approach for learning the matrix factorization model, ActivePMFv2 model. The online-updating approach of ActivePMFv2 is presented in Algorithm 2, while the full retrain approach of ActivePMFv2 is presented in Algorithm 1. The online-updating approach uses the same sampling heuristics and the same active learning approach in query method as that of the full retrain approach. The main difference between the online-updating approach and the full retrain approach is the model update methods they used. In the full retrain approach, it retrains the whole learning model after each work done. In the online-updating approach, it has two main parts: (1) It retrains the learning model in batch mode where model update occurs after a number of work done; (2) For each work done related to a worker (or task) having profile larger than the threshold, it updates the whole feature vector of the worker (or task) and keeps all other entries in the feature matrix fixed. On the other hand, for work done related to a worker (or task) having profile smaller than the threshold, it updates the whole feature matrix.

Partial update

The impact on retraining the whole learning model decreases as the profile size of the worker (or task) increases. Especially when work done by new workers or work done on task having small profile, updating the feature matrix is crucial. For a new worker, each work done by him will result in much change in his task perference in his worker profile; while for a worker that has already completed a lot of tasks, each work done by him will not change much in his worker profile. Updating feature vectors for a worker (or task) having smaller profile results in a much better model. As a result, for a worker (or task) having large profile, we observe that the model learned from retraining the feature vector of the worker (or task) only is approximate to that learned from a full retrain.

As mentioned before, through a Bayesian inference, the posterior distributions of W and V based only on the observed ratings are derived in Eq. (3), while the posterior distributions of W and C based only on the observed ratings are derived in Eq. (7). For partial update on a large worker profile, we only retrain the feature vector of the selected worker \(w_{m^{\prime }}\phantom {\dot {i}\!}\) in worker-task preferring matrix and worker-category preferring matrix as shown in Eq. (23) and Eq. (24) respectively.

As stated previously, through a Bayesian inference, the posterior distributions of W and V based only on the observed ratings are derived in Eq. (3), while the posterior distributions of V and C based only on the observed ratings are derived in Eq. (11). For partial update on a large task profile, we only retrain the feature vector of the selected task \(v_{n^{\prime }}\) in worker-task preferring matrix and task-category grouping matrix as shown in Eq. (25) and Eq. (26) respectively.

In Algorithm 2, Step 12, 14 and 16 represents the process of partial update. For a worker (or task) having profile size larger than the threshold, the algorithm retrains the feature vector of the worker (or task) and keep all other entries in the matrix unchanged; otherwise, the algorithm retrains the feature vectors of all workers and all tasks.

Batch update

The time for retraining a learning model is proportional to the computational complexity of the model and the amount of information stored in the model; while the amount of information depends on the number of workers, the number of tasks and the number of work done. Retraining a large learning model takes a long time. For a large real-world crowdsourcing system, it is inefficient if the whole model is retrained from scratch once a worker completes a task.

In Algorithm 2, Step 24 and 25 represents the process of batch update. When the number of work done is smaller than the batch size, the learning model is not retrained. On the other hand, when the number of work done is larger than the batch size, the algorithm retrains the learning model.

General update problem

Our proposed online-updating approach can also be applied in a general update problem. In a general recommendation system, a new rating ru,i might affect the features of both user u and item i. The partial update method works based on the following conditions: (1) If the profiles of both user and item are large, it could update the feature vectors of both the selected user and the selected item; (2) If the user profile is large (but the item profile is small), it could update the user feature vector only; (3) If the item profile is large (but the user profile is small), it could update the item feature vector only; (4) If the profiles of both user and item are small, it could update the feature vectors of all users and all items. Besides, based on the number of new incoming ratings, the batch update method retrains the whole learning model. A general update algorithm is shown in Algorithm 3.

Complexity analysis

To compute the complexity of our ActivePMFv2, we consider both the computation of the gradient descent methods and the computation of selecting the most uncertain task for the most reliable worker. The main computation of the gradient descent methods is evaluating objective function E and corresponding gradients on variables. Because of the sparsity of matrices R, U, and D, the complexity of evaluating the objective function in Eq. (13) is \(\mathcal {O} \left (n_{R}l + n_{U}l + n_{D}l \right)\), where nR, nU and nD are the number of non-zero entries in matrices R, U, and D respectively, and l is the number of dimensions of latent feature space. By using the similar approach, we can derive the complexities of Eq. (14), Eq. (15) and Eq. (16). For the computation of selecting the most uncertain task for the most reliable worker, the complexity of selecting the most uncertain task in Eq. (21) is \(\mathcal {O} \left (n_{R}l \right)\), the complexity of selecting the most reliable worker in Eq. (22) is \(\mathcal {O} \left (m \right)\), and thus the total complexity of assigning a task to a worker is \(\mathcal {O} \left (m + n_{R}l \right)\). As a result, the total complexity for one iteration is \(\mathcal {O} \left (m + n_{R}l + n_{U}l + n_{D}l \right)\). It means that the complexity is linear with respect to the number of workers and the number of observations in the three sparse matrices. The complexity analysis shows that ActivePMFv2 can scale to very large datasets.

To apply online-updating approach when learning ActivePMFv2 model, we consider both the partial update method and the batch update method. Since only some non-zero entries in matrices is updated in the partial update method, the complexity of evaluating the objective function in Eq. (13) is \(\mathcal {O} \left (z_{R}l + z_{U}l + z_{D}l \right)\), where zR, zU and zD are the number of non-zero entries to be updated in matrices R, U, and D respectively, and l is the number of dimensions of latent feature space. As a result, the total complexity for one iteration is \(\mathcal {O} \left (m + z_{R}l + z_{U}l + z_{D}l \right)\), where zR<<nR, zU<<nU, zD<<nD. Based on our observation, the complexity of learning ActivePMFv2 model can be highly reduced by using the partial update method. Besides, the batch method can greatly reduce the number of iterations, thus further reduce the computational complexity.

Experimental analysis

In this section, our experiments are intended to address the following five research questions:

1.

How is ActivePMFv1 approach compared with PMF with various active learning approaches?

2.

How is ActivePMFv1 approach compared with PMF with various active sampling heuristics?

3.

How is our ActivePMFv2 approach compared with ActivePMFv1?

4.

How is the partial update part of online-updating approach compared with the full retrain when learning ActivePMFv2 model?

5.

How is the batch update part of online-updating approach compared with the full retrain when learning ActivePMFv2 model?

Description of dataset

We use the same dataset as shown in [41]. Our dataset is retrieved from the recent NAACL 2010 workshop on crowdsourcing, which has made publicly available all the data collected as part of the workshop [49]. The data was collected within a month from multiple requesters seeking data for a diverse variety of tasks on MTurk. Table 2 provides some statistics about our dataset. Our dataset is mainly related to tasks for creating speech and language data. The task categorization is shown in Table 3.

Table 2

Statistics of our dataset

Number of workers

1,592

Number of different tasks

6,639

Number of categories

43

Total HITs from all tasks

19,815

Number of ratings

19,815

Max number of HITs of a worker

2,691

Min number of HITs of a worker

1

Average number of HITs of a worker

12.4

1st quartile (25th percentile) of number of HITs of a worker

1

2nd quartile (50th percentile) of number of HITs of a worker

2

3rd quartile (75th percentile) of number of HITs of a worker

5

Table 3

Task categorization by both language and keywords given by MTurk in our dataset

1

English-Afrikaans translations

23

English-Romanian translations

2

English-Azeri translations

24

English-Russian translations

3

English-Bulgarian translations

25

English-Slovak translations

4

English-Bangla translations

26

English-Somali translations

5

English-Bosnian translations

27

English-Albanian translations

6

English-Welsh translations

28

English-Serbian translations

7

English-Spanish translations

29

English-Tamil translations

8

English-Basque translations

30

English-Thai translations

9

English-Farsi translations

31

English-Turkmen translations

10

English-Irish translations

32

English-Tagalog translations

11

English-Hindi translations

33

English-Turkish translations

12

English-Indonesian translations

34

English-Tatar translations

13

English-Korean translations

35

English-Ukrainian translations

14

English-Kurdish translations

36

English-Urdu translations

15

English-Latin translations

37

English-Uzbek translations

16

English-Latvian translations

38

English annotations

17

English-Mongolian translations

39

Spanish annotations

18

English-Maltese translations

40

Arabic annotations

19

English-Nepali translations

41

English relevance judgment

20

English-Punjabi translations

42

English creative writing

21

English-Kapampangan translations

43

English transcription

22

English-Polish translations

Evaluation metrics

For performance comparison with our proposed ActivePMFv2, we implement PMF (Probabilistic Matrix Factorization) with the following Active Learning baselines:

T[Rand]W[Rand]: It assigns a randomly selected task to a randomly selected worker.

T[MaxDiff]W[Rand]: It assigns the task of maximum difference between its observed values and its predicted values to a randomly selected worker.

T[Rand]W[Reli]: It assigns a randomly selected task to the most reliable worker in the task category.

T[MaxDiff]W[Reli]: It assigns the task of maximum difference between its observed values and its predicted values to the most reliable worker in the task category.

T[N]W[Reli]+T[MaxPredictErr]W[Reli]: It has two parts. First, it assigns all new tasks to the most reliable worker in the task categories. Second, it assigns the task of maximum average prediction error of all its predicted values to the most reliable worker in the task category.

T[N]W[Reli]+T[MaxEntropy]W[Reli]: It has two parts. First, it assigns all new tasks to the most reliable worker in the task categories. Second, it assigns the task of maximum Entropy on the posterior variance to the most reliable worker in the task category.

T[N]W[Reli]+T[MaxDiff]W[Reli] (ActivePMFv1): It has two parts. First, it assigns all new tasks to the most reliable worker in the task categories. Second, it assigns the task of maximum difference between its observed values and its predicted values to the most reliable worker in the task category.

T[N]W[Reli]+T[MaxDiff]W[N]+T[MaxDiff]W[Reli] (ActivePMFv2): It has three parts. First, it assigns all new tasks to the most reliable worker in the task categories. Second, for all new workers, it assigns the task of maximum difference between its observed values and its predicted values to a randomly selected new worker. Third, it assigns the task of maximum difference between its observed values and its predicted values to the most reliable worker in the task category.

To compare the prediction quality of our method ActivePMFv2, we use the Mean Absolute Error (MAE) and the Root Mean Squared Error (RMSE) as the comparison metrics. MAE and RMSE are defined in Eq. (27):

where ri,j denotes the rating that indicates the extent of the favor of task j for worker i, \(\hat {r}_{i, j}\) denotes the predicted rating, and N is the total number of testing ratings.

Performance comparison

To show the prediction performance improvements of ActivePMFv2, we first compare ActivePMFv1 [15] with PMF with various active learning approaches and different active sampling heuristics, where Probabilistic Matrix Factorization (PMF) [12] is the state-of-the-art approach for recommendation systems. Next, we compare our proposed ActivePMFv2 with ActivePMFv1.

From our dataset, we randomly select 80 % of ratings as training data (20 % as initial training set + 60 % as active set), and leave the remaining 20 % as prediction performance testing. The procedure is carried out 10 times independently, and we report the average values in this paper. For the value transformation, we have 10,411 approved tasks (value transformed to 5), 9,399 submitted tasks (value transformed to 3) and only 5 rejected tasks (value transformed to 4). Most rejected tasks are already removed in our dataset. In the comparison, we set θW = θV = θC = 0.00004, set θU = 0.0001 and θD = 0.01. The MAE results and the RMSE results are reported from Tables 4, 5, 6, 7, 8 and 9. MAE measures the average magnitude of the errors in predicted values; while RMSE gives a relative high weight to large errors.

In Tables 4 and 5, we compare among PMF with three sampling heuristics on selecting tasks, which are Maximum Difference (ActivePMFv1), Maximum Prediction Error and Maximum Entropy. The performance results of the three approaches are similar when the number of selected samples is small. However, as the number of selected samples increases, Maximum Difference approach (ActivePMFv1) outperforms the other two approaches.

In Tables 6 and 7, we compare among PMF with different active learning approaches on task selection and worker selection. When the number of selected samples is very small, assigning tasks to workers randomly gives the best performance in both the MAE results and the RMSE results. However, as the number of selected samples increases, assigning new tasks in the first stage can greatly improve the performance in both the MAE results and the RMSE results. Compared with random selection on task and worker to the PMF learning model (i.e. T[Rand]W[Rand]), ActivePMFv1 can greatly improve both MAE and RMSE performance.

In Tables 8 and 9, we compare our proposed ActivePMFv2 with ActivePMFv1. The performance results of the three approaches are similar when the number of selected samples is small. However, as the number of selected samples increases, ActivePMFv2 outperforms ActivePMFv1 in both MAE and RMSE results. Compared with ActivePMFv1, the MAE results and the RMSE results of ActivePMFv2 are improved up to 29 % and 35 % respectively.

In Table 10, we compare the partial update part of our online-updating approach with the full retrain on ActivePMFv2 model learning. The prediction quality of ActivePMFv2 with partial update (threshold t = 0.001) approximates to that of ActivePMFv2 with full retrain, but the average runtime on model update per work done of ActivePMFv2 with partial update (threshold t = 0.001) is greatly reduced by 12 % compared with that of ActivePMFv2 with full retrain.

By using the batch update method, the average runtime on model update per work done can be further reduced. In Table 11, we compare the batch update part of our online-updating approach with the full retrain on ActivePMFv2 model learning. As batch size increases, both the MAE results and the RMSE results also increase, while the average runtime on model update per work done decreases significantly. For instance, compared with full retrain, when batch size is 500 with partial update (threshold t = 0.001), the average runtime on model update per work done is reduced by 99.6 % (decreases from 3.839 min to 0.017 min), but both the MAE results and the RMSE results increases by several times. On the other hand, compared with full retrain, when batch size is 10 with partial update (threshold t = 0.001), the average runtime on model update per work done is reduced by 82.4 % (decreases from 3.839 min to 0.675 min), while the MAE result is just increased by 22.4 % (increases from 0.0156 to 0.0191) and the RMSE result is only slightly increased by 8.2 % (increases from 0.0845 to 0.0914). Therefore, by adjusting the batch size, the average runtime on model update per work done can be reduced significantly, but only very small performance degradation is resulted. In the cases shown in Table 11, batch size 10 is the best choice among all the listed choices.

Table 11

Comparison on a Full-Retrain with Batch Update on Online-Updating Approach on ActivePMFv2 model learning (Feature k = 20; No of Work Done = 11,000)

ActivePMFv2 model

MAE

RMSE

Avg Runtime per Work Done (min)

Full Retrain

0.0156

0.0845

3.839

Online-Updating (Partial = 0.001; Batch = 1)

0.0156

0.0845

3.374

Online-Updating (Partial = 0.001; Batch = 10)

0.0191

0.0914

0.675

Online-Updating (Partial = 0.001; Batch = 50)

0.0313

0.1353

0.142

Online-Updating (Partial = 0.001; Batch = 100)

0.0515

0.2103

0.137

Online-Updating (Partial = 0.001; Batch = 150)

0.0768

0.2977

0.049

Online-Updating (Partial = 0.001; Batch = 200)

0.0513

0.2033

0.038

Online-Updating (Partial = 0.001; Batch = 500)

0.1022

0.3445

0.017

Conclusion

In this paper, we have proposed Probabilistic Matrix Factorization with Active Learning (version 2), ActivePMFv2, on Task Recommendation framework, TaskRec, for quality assurance in crowdsourcing systems. It first randomly assigns new tasks to the most reliable worker in the task categories. Second, it actively selects the most uncertain task, and then request new workers to complete the task. Third, it actively selects the most uncertain task, and then request the most reliable workers to complete the task for retraining the classification model. Experimental results show that our ActivePMFv2 outperforms ActivePMFv1, where the MAE results and the RMSE results of ActivePMFv2 are improved up to 29 % and 35 % respectively. Previous work shows that ActivePMFv1 outperforms the PMF with other active learning approaches significantly.

Moreover, we have proposed online-updating on the task recommendation model to reduce the runtime of retraining the model. In a large-scale crowdsourcing system, it does not make sense to retrain the model from scratch whenever a worker of large profile completes a task, because the performance improvement by retraining the model in the case is tiny but the cost of retraing model is high. Moreover, when a large number of workers are working in the crowdsourcing system at the same period of time, the computational complexity is very high if the model is retrained after each worker completes a task. The larger the profile of a worker (or task) is, the less important is retraining its profile on each new work done. In case of the worker (or task) having large profile, our online-updating algorithm retrains the whole feature vector of the worker (or task) and keeps all other entries in the matrix fixed. Our online-updating algorithm runs batch update to reduce the running time of model update. Experiment results show that our online-updating algorithm is accurate in approximating to a full retrain of the learning model while the average runtime of model update for each work done is reduced by more than 80 % (decreases from a few minutes to several seconds).

Declarations

Acknowledgements

This research was in part supported by grants from the National Grand Fundamental Research 973 Program of China (No. 2014CB340405), the Research Grants Council of the Hong Kong Special Administrative Region, China (Project No. CUHK 413212), and Microsoft Research Asia Regional Seed Fund in Big Data Research (Grant No. FY13-RES-SPONSOR-036).

Authors’ contributions

All authors made substantial contributions to conception and design of the work. MC carried out experiments, data analysis and data interpretation. Besides, MC participated in drafting and revising the manuscript. All authors read and approved the final manuscript.

Competing interests

The authors declare that they have no competing interests.

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Authors’ Affiliations

(1)

Department of Computer Science and Engineering, The Chinese University of Hong Kong