This short paper is describing a demonstrator that is
complementing the paper "Towards Cross-Media Feature
Extraction" in these proceedings. The demo is exemplifying the
use of textual resources, out of which semantic information can
be extracted, for supporting the semantic annotation and indexing
of associated video material in the soccer domain. Entities and
events extracted from textual data are marked-up with semantic
classes derived from an ontology modeling the soccer domain. We show further how extracted Audio-Video features by video
analysis can be taken into account for additional annotation of
specific soccer event types, and how those different types of
annotation can be combined.

Models for collecting and aggregating categorical data on crowdsourcing platforms typically fall into two broad categories: those assuming agents honest and consistent but with heterogeneous error rates, and those assuming agents strategic and seek to maximize their expected reward. The former often leads to tractable aggregation of elicited data, while the latter usually focuses on optimal elicitation and does not consider aggregation. In this paper, we develop a Bayesian model, wherein agents have differing quality of information, but also respond to incentives. Our model generalizes both categories and enables the joint exploration of optimal elicitation and aggregation. This model enables our exploration, both analytically and experimentally, of optimal aggregation of categorical data and optimal multiple-choice interface design.

LG has placed its trust on Google Assistant and has given it the power to control its smart appliances. While it teamed up with Amazon earlier this year to give its refrigerators built-in access to Alexa, its partnership with Google is much bigger in scale. Now, you can control any of the company's 87 WiFi-connected smart home appliances by barking out orders through a Google Home speaker or through a compatible iOS or Android smartphone. Once you're done setting voice control up through LG's SmartThinQ app, you can use commands within a Home speaker's range or through a phone to tell your fridge to make more ice or to tell your AC to adjust the temperature. If you have an LG washing machine, you can ask Assistant how much time is still left before your load is done.

Biased labelers are a systemic problem in crowdsourcing, and a comprehensive toolbox for handling their responses is still being developed. A typical crowdsourcing application can be divided into three steps: data collection, data curation, and learning. At present these steps are often treated separately. We present Bayesian Bias Mitigation for Crowdsourcing (BBMC), a Bayesian model to unify all three. Most data curation methods account for the {\it effects} of labeler bias by modeling all labels as coming from a single latent truth. Our model captures the {\it sources} of bias by describing labelers as influenced by shared random effects. This approach can account for more complex bias patterns that arise in ambiguous or hard labeling tasks and allows us to merge data curation and learning into a single computation. Active learning integrates data collection with learning, but is commonly considered infeasible with Gibbs sampling inference. We propose a general approximation strategy for Markov chains to efficiently quantify the effect of a perturbation on the stationary distribution and specialize this approach to active learning. Experiments show BBMC to outperform many common heuristics.

In many machine learning applications, crowdsourcing has become the primary means for label collection. In this paper, we study the optimal error rate for aggregating labels provided by a set of non-expert workers. Under the classic Dawid-Skene model, we establish matching upper and lower bounds with an exact exponent $mI(\pi)$ in which $m$ is the number of workers and $I(\pi)$ the average Chernoff information that characterizes the workers' collective ability. Such an exact characterization of the error exponent allows us to state a precise sample size requirement $m>\frac{1}{I(\pi)}\log\frac{1}{\epsilon}$ in order to achieve an $\epsilon$ misclassification error. In addition, our results imply the optimality of various EM algorithms for crowdsourcing initialized by consistent estimators.