Wednesday, January 22, 2014

I have been following with interest the discussion about the future of education.

***

Some people criticize existing educational institutions, indicating that they offer little in terms of real training, and that real learning occurs outside the classroom, by actually doing. "Nobody learns how to build a system in a computer science class." "Nobody learns how to build a company in an entrepreneurship program."

Others are lamenting that by shifting to training-oriented schemes, we are losing the ability to offer deeper education, on topics that are not marketable. Who is going to study poetry if it has no return on investment? Who is going to teach literature if there is no demand for it?

These two criticisms seem to be pushing in two different directions.

***

In reality, we need to address two different needs:

One need is to really try and democratize education, trying to take the content of the top courses and make it accessible and available to everyone. People that want to learn machine learning, can now take courses from top professors, instead of having to read a book. People can now advance their careers easily, without having to enroll to expensive degree programs.

The other need is to preserve the breadth of education, shielding it from market forces. This need wants to preserve the structure where students during their education get exposed to diverse fields, no matter if there is a market and demand for these fields.

***

This tension reminded me about the discussion about genetically modified foods.

Mass production of food pretty much solved the problem of world hunger. A few decades ago, there was a real problem with world hunger. Famine was a real problem in many areas of the world, due to the inability to produce enough food to feed the growing population: floods, droughts, diseases were disrupting production, resulting in shortages. Today, the advances in agriculture allow the abundant production of grains and food: wheat and rice varieties are now robust, resistant to diseases, adaptable to many different climates, and allow us to feed the world.

The advances that solved the problem of world hunger, ended up creating other problems. Processed carbohydrates and causing obesity, diabetes, gout, and many other "luxury" diseases in the developed world. The poor in the developed world are not dying because they are hungry. They are dying by starving themselves from essential ingredients in their diet.

***

The parallels are striking. The MOOCs, Khan Academies, and Code Academies of the world are the genetically modified foods for those living in the "third world of education". These courses may not be the most nutritious, and they may not provide all the "nutrition" for their education. However, the choice for many of these people in the "third world of education" is not Stanford vs. a Coursera MOOC. It is nothing vs. a Coursera MOOC. Given the choice, take the MOOC at any time.

Those that live in the "developed world of education" can be pickier. They may have access to the genetically modified MOOCs, but if they can afford it, the organic, artisanal, locally sourced education can be potentially better than the mass produced MOOC.

Monday, January 20, 2014

A common question that comes up when discussing research in crowdsourcing, is how it compares with similar efforts in other fields. Having discussed these a few times, I thought it would be good to collect all these in a single place.

Ensemble learning: In machine learning, you can generate a large number of "weak classifiers" and then build a stronger classifier on top. In crowdsourcing, you can treat each human as a weak classifier and then learn on top. What is the difference? In crowdsourcing, each judgement has a cost. With ensembles, you can trivially easy create 100 weak classifiers, classify each object, and then learn on top. In crowdsourcing, you have a cost for every classification decision. Furthermore, you cannot force every person to participate, and often you have a heavy-tailed participation: A few humans participate a lot, but from many of them we get only a few judgments.

Quality assurance in manufacturing: When factories create batches of products, they also have a sampling process where they examine the quality of the manufactured products. For example, a factory creates light bulbs, and wants 99% of them to be operating. The typical process involves setting aside a sample for testing and testing if they meet the quality requirement. In crowdsourcing, this would be equivalent to verifying, with gold testing or with post-verification, the quality of each worker. Two key differences: The heavy-tailed participation of workers means that gold-testing each person is not always efficient, as you may end up testing a user a lot, and the the user may leave. Furthermore, it is often the case that a sub-par worker can still generate somewhat useful information, while for tangible products, the product is either acceptable or not.

Active learning: Active learning assumes that humans can provide input to a machine learning model (e.g., disambiguate an ambiguous example) and the answers are assumed to be perfect. In crowdsourcing this is not the case, and we need to explicitly take the noise into account.

Test theory and Item Response Theory: Test theory focuses on how to infer the skill of a person through a set of questions. For example, to create a SAT or GRE test, we need to have a mix of questions of different difficulties, and we need to whether these questions really separate the persons that have different abilities. Item Response Theory studies exactly these questions, and based on the answers that users give to the tests, IRT calculates various metrics for the questions, such as the probability that a user of a given ability will answer correctly the question, the average difficulty of a question, etc. Two things make IRT unapplicable directly to a crowdsourcing setting: First, IRT assumes that we know the correct answer to each question; second, IRT often requires 100-200 answers to provide robust estimates of the model parameters, a cost that is typically too high for many crowdsourcing applications (except perhaps the citizen science and other volunteer based projects).

Theory of distributed systems: This part of CS theory is actually much closer to many crowdsourcing problems than many people realize, especially the work on asynchronous distributed systems, which attempts to solve many coordination problems that appear in crowdsourcing (e.g. agree on an answer). The work on analysis of byzantine systems, which explicitly acknowledges the existence of malicious agents, provides significant theoretical foundations for defending systems against spam attacks, etc. One thing that I am not aware of, is the explicit dealing of noisy agents (as opposed to malicious ones), and I am not aware of any study of incentives within that context that will affect the way that people answer to a given question.

Database systems and User-defined-functions (UDFs): In databases, a query optimizer tries to identify the best way to execute a given query, trying to return the correct results as fast as possible. An interesting part of database research that is applicable to crowdsourcing is the inclusion of user-defined-functions in the optimization process. A User-Defined-Function is typically a slow, manually-coded function that the query optimizer tries to invoke as little as possible. The ideas from UDFs are typically applicable when trying to optimize in a human-in-the-loop-as-UDF approach, with the following caveats: (a) UDFs were considered to be return perfect information, and (b) the UDFs were assumed to have a deterministic or a stochastic but normally distributed execution time. The existence of noisy results and the fact that execution times with humans can be often long-tailed make the immediate applicability of UDF research in optimizing crowdsourcing operations rather challenging. However, it is worth reading the related chapters about UDF optimization in the database textbooks.

(Update) Information Theory and Error Correcting Codes: We can model the workers are noisy channels, that get as input the true signal and return back a noisy representation. The idea of using advanced error correcting codes to improve crowdsourcing is rather underexplored, imho. Instead we rely too much on redundancy-based solutions, although pure redundancy has been theoretically proven to be a suboptimal technique for error correction. (See an earlier, related blog post.) Here are a couple of potential challenges: (a) The errors of the humans are very rarely independent of the "message" and (b) It is not clear if we can get humans to compute properly functions that are commonly required for the implementation of error correcting codes. See a related e

(Update) Information Retrieval and Interannotator Agreement: In information retrieval, it is very common to examine the agreement of the annotators when labeling the same set of items. My own experience with reading the literature, and the related metrics is that they implicitly assume that all workers have the same level of noise, an assumption that is often violated in crowdsourcing.

Any other fields and what other caveats that should be included in the list?