Collaborative Crowdsourcing for Labeling Machine Learning Datasets

Generating comprehensive labeling guidelines for crowdworkers can be
challenging for complex datasets. Revolt harnesses crowd disagreements
to identify ambiguous concepts in the data and coordinates the crowd to
collaboratively create rich structures for requesters to make post-hoc
decisions, removing the need for comprehensive guidelines and
enabling dynamic label boundaries.

Highlighting can be mentally taxing for learners who are often unsure about how
much information they needed to include. We introduce the idea of
intentionally uncertain input in the context of highlighting on mobile devices.
We present a system that uses force touch and fuzzy bounding boxes to support
saving information while users are uncertain about where to highlight.

Clustering with Crowds and Computation

Many crowd clustering approaches have difficulties providing global context to
workers in order to generate meaningful categories. Alloy uses a
sample-and-search technique to provide global context, and
combines the deep semantic knowledge from human computation and the scalability
of machine learning models to create rich structures from unorganized documents
with high quality and efficiency.

Big Picture Thinking in Small Pieces

People often search to web to find solutions to problems beyond factual
question, such as planning road trips, writing an report, or buying a new
camera. The Knowledge Accelerator uses crowdworkers to synthesize different
information sources on the web in response to a query. We prototyped this
system in order to explore crowdsourcing complex, high context tasks in a
microtask environment.

Code-switching behavior is common on social media for expressing solidarity or
to establish authority. While past work on automatic code-switching detection
depends on dictionary look-up or named-entity recognition, our recurrent neural
network model that relies on only raw features outperformed the top systems in
the EMNLP'14 Code-Switching Workshop by 17% in error rate reduction.

Learning to Find Translations and Transliterations on the Web

TermMine is an information extraction system that can automatically mine
translation pairs of terms from the web. We used a small set of terms and
translations to gather mixed-code text from the web to train a CRF model that
can identify translation pairs at run-time.

Supersense Tagging Named Entities on Wikipedia

Joseph Chee Chang, Richard Tsai, and Jason S. Chang. PACLIC 2009.

We introduced a method for classifying named-entities into broad semantic
categories in WordNet. We extracted rich features from Wikipedia, allowing us
to classify named-entities with high precision and coverage. The result is a
large scale named-entity semantic database with 1.2 million entries and over
95% accuracy, covering 80% of all named-entities found on Wikipedia.