December 23, 2010

In 1976 John McCarthy, one of the founders of artificial intelligence, wrote a memo discussing the problem of getting a computer to understand a story from the New York Times:

"A 61-year old furniture salesman, John J. Hug, was pushed down the shaft of a freight elevator yesterday in his downtown Brooklyn store by two robbers while a third attempted to crush him with the elevator car because they were dissatisfied with the $1,200 they had forced him to give them."

McCarthy suggested that a real understanding of this story would entail being able to answer questions like:

Who was in the store when the events began? Who had the money at the end? Did Mr. Hug know he was going to be robbed? Does he know now that he was robbed?

Answering these questions is still beyond the state of the art in natural language processing, because they require common sense knowledge in addition to the text of the story. In fact the problems associated with answering questions only based on the text of the story are only beginning to be solved on a large scale:

When and where was Mr. Hug pushed? Who forced who to give $1,200 to whom? Did the money satisfy the robbers?

To achieve an understanding at this level, we need to address linguistic problems like word sense disambiguation ("push" has 15 senses), named entity recognition (Mr. Hug = John J. Hug), anaphora resolution (him = John J. Hug), parsing (who did what to whom?), semantic relation identification (dissatisfied with $1,200 = $1,200 did not satisfy). The figure below illustrates the main challenge: the ambiguity present in most natural language expressions. Our group studies statistical machine learning methods to address these problems with the eventual goal of natural language understanding by machines.

Different interpretations of the sentence: "I saw the man on the hill with a telescope."

November 16, 2010

Naive Bayes and Maximum Entropy models are both popular in NLP applications. In this post I will show that Naive Bayes is really a type of joint Maximum Entropy model (and much easier to compute). Maximum Entropy models aim to find the distribution that maximizes entropy while still obeying certain feature expectation constraints. Maximizing entropy with feature expectation constraints turns out to be equivalent to maximizing likelihood in a log-linear (aka exponential) model. Logistic regression is a specific type of conditional maxent model. Naive Bayes is a specific type of joint maxent model.

Consider a classification problem with a D dimensional input vector x and a discrete output variable y. Naive Bayes models assume that the individual components of x are independent given the output y:

\[ p(\mathbf{x}, y) = p(y) \prod_{d=1}^D p(x_d|y) \]

The factors on the right hand side are then estimated using maximum likelihood, which boils down to counting frequencies if x and y are both discrete. In comparison here is a joint maximum entropy model:

where fm are arbitrary real valued "feature functions" of the input and the output and Z is a normalization constant. This looks nothing like Naive Bayes at first glance, but if we choose feature functions that look at the output y with at most one component of the input x:

the similarity becomes more apparent. In fact if the feature functions are binary, this expression will have exactly the same form as the Naive Bayes expression, and maximizing likelihood will give exactly the same answers. Let us illustrate with an example:

Because of the symmetry of this example, not all lambda coefficients will be distinct, in fact only three distinct lambdas are necessary as indicated in the last column of the table. The maximum likelihood estimates of the lambdas will satisfy the expectation constraints: the empirical frequency of each feature will be equal to the expected frequency:

Here the sum on the left is over the instances in the dataset, whereas the sum on the right is over all possible x, y pairs. This equation can be derived by setting the derivative of the log likelihood expression to zero. The following parameters satisfy these constraints and thus maximize the likelihood:

\[ \lambda_0 = \lambda_2 = 0, \lambda_1 = \log(3), Z = 32 \]

This joint maximum entropy model gives exactly the same results as the Naive Bayes model. For example:

A conditional maxent model will satisfy conditional expectation constraints and maximize conditional likelihood (which can again be derived by setting the derivative of the log conditional likelihood to zero):

However it is important to understand that this model handles feature interactions better than Naive Bayes not because of any magic associated with maxent models, but as a result of maximizing conditional instead of joint likelihood. When you maximize joint likelihood, maxent gives the same results as Naive Bayes. Of course with maxent, one always has the option of defining features with cross terms (more than one input variable) to handle feature interaction.

November 15, 2010

This workshop aims to foster the development of innovative, targeted, formalism-independent parser evaluation resources and methods that will guide us in building the next generation of parsers.

Under many of our existing evaluation measures, parsing accuracy appears to have plateaued around the 90% mark. To continue making meaningful improvements to parsing technology, we first need to clarify what this 90% represents. Do our evaluations measure semantically-relevant syntactic phenomena? Do they accurately represent multiple domains, languages, and formalisms? How relevant are they for downstream tasks? Do they reflect the level of inter-annotator agreement? We also need to identify and understand the "missing 10%": there is a growing awareness in the community that parsers may perform poorly on less frequent but semantically important syntactic phenomena, but in fact we are not even certain whether such crucial phenomena are represented in our current evaluation schemes. We need new ways of highlighting the specific areas where parsers need to improve.

We believe parser evaluation should:

- be relevant for multiple formalisms, languages, and domains- be targeted towards finding parser weaknesses- focus on semantically important tasks- be extrinsic or task-oriented as well as intrinsic- be based on schemes with high inter-annotator agreement- show us how we can improve parser training methods

Despite the advances made in cross-framework evaluation, such evaluations still suffer from a loss of accuracy arising from conversion between output formats. One recent answer to this problem is the PETE task. In PETE (Yuret et al., 2010, http://pete.yuret.com) parser evaluation is performed using simple syntactic entailment questions. Given the sentence "The man who stole my car went to jail", the annotator is asked to judge entailments like "The man went to jail" or "My car went to jail". This scheme is formalism-independent, has high inter-annotator agreement, and focuses evaluation on semantically relevant distinctions. A new version of PETE will form the shared task for this workshop.

Another known weakness in existing evaluation measures, including ones based on grammatical relation formalisms, is that they are aggregate measures, in which syntactic phenomena are de facto weighted by frequency rather than by degree of syntactic difficulty or semantic importance. Thus such measures are are likely to have disproportionate contributions from high-frequency, "easy" grammatical phenomena such as determiners and subjects; while frequency weighting is obviously important, it makes it difficult to discern the phenomena where parsers really need to improve.

One answer to this problem is to focus evaluation on syntactic phenomena which we know to be difficult for parsers, such as the unbounded dependency evaluations performed in Rimell et al. (2009) and Nivre et al. (2010). This area is wide open for development: we have known for a long time that parsers have difficulty with phenomena like coordination and PP attachment, but are there other problematic constructions? We should also focus on finding new ways of determining which phenomena are most difficult, and hence where we need to focus parser training efforts. Also crucial is finding ways to measure the importance of parser errors for downstream tasks, especially semantic tasks, and weighting parser performance accordingly.

Third, many evaluations are still intrinsic, and while intrinsic evaluations play an important role -- especially for developing new parsers, and for fine-grained comparisons with previous work -- it is increasingly clear that performance on intrinsic evaluations doesn't always predict task performance.

Recent papers such as Miyao et al. (2008, 2009) and Miwa et al. (2010) focus on task-based evaluation, especially for the biomedical domain. We need more evaluations that focus on a greater range of tasks, languages, and domains (or even subdomains, since the field has barely begun to address how the vocabulary and writing conventions across e.g. biomedical subdomains may affect parsing accuracy).

Finally, unlike other NLP subfields, almost no parser evaluation studies discuss the relevance of inter-annotator agreement. It may be that the 90% evaluation plateau reflects the limits of inter-annotator agreement, but we lack a clear picture of how these figures correspond. New, more natural annotation methods may help in this area.

At this workshop we especially encourage papers that consider how techniques and resources from other NLP subfields can be brought to bear on parser evaluation. Perhaps resources annotated with information on compound nouns, subcategorization frames, selectional preferences, or textual entailments may serve as gold standards. Perhaps new gold standards may be created by exploiting shallow parsing or novel approaches to human annotation. Perhaps we can learn something from sentence simplification, semantic parsing, or active learning. Ultimately we are interested in finding new and exciting ways to identify where we need to improve our parsers.

The workshop will have two parts.

Part I: PETE-2 shared task. This will be an updated version of the successful SemEval-2010 shared task on Parser Evaluation using Textual Entailments. As noted in Yuret et al. (2010), two important improvements to the task are re-balancing the composition of syntactic phenomena covered in the task dataset, and automating the entailment generation process. Both of these improvements will be made for the new PETE-2 dataset.

Anyone will be welcome to submit a system to the shared task portion of the workshop, and reports on the shared task will make up part of the workshop program. For teams not wishing to build their own RTE system to interpret their parser output, we will ofter a simple system that generates RTE judgments from Stanford Dependency output, based on the top performing systems from SemEval-10 PETE.

Part II: Papers. We invite full-length papers which present evaluation resources, tools, techniques, or ideas; results of new evaluations; or new methods for targeted parser training based on evaluation results. We welcome submissions on all related topics, including but not limited to:

- new formalism-independent evaluation resources - new domain-specific or cross-domain evaluation resources - new language-specific or multi-lingual evaluation resources - new evaluation resources targeted to specific syntactic phenomena - new approches to identifying syntactic phenomena that are difficult for parsers - evaluation schemes that consider semantic relevance - new extrinsic or task-based evaluations - schemes for improvement of a parser based on evaluation results - evaluation techniques that consider inter-annotator agreement - ideas for bringing insights from other NLP subfields to bear on parser evaluation

Statement of research interests and areas of expertise: Rimell has worked on domain adaptation for parsing and is interested in novel parser evaluation methods. She has worked on the evaluation of a variety of treebank, grammar-based, and dependency parsers on unbounded dependencies and was a contributor to the COLING-08 parser evaluation workshop as well as a member of the top-performing team in the SemEval-10 PETE task. She is also currently working on acquisition of lexical resources and has an interest in their relationship with parsing and parser evaluation.

Statement of research interests and areas of expertise: Yuret has worked on unsupervised parsing and various unsupervised disambiguation problems, including word senses, semantic relations, and morphology. He was the organizer of the SemEval-10 PETE task and is currently co-organizing the next SemEval.

Narratives are ubiquitous. We use them to educate, communicate, convince, explain, and entertain. As far as we know every society has narratives, which suggests they are deeply rooted and serve an important cognitive function: that narratives do something for us. It is clear that, to fully explain human intelligence, beliefs, and behaviors, we will have to understand and explain narrative.

The symposium will bring together researchers with a wide variety of perspectives to share what is known about the fundamentals of the computational modeling of narrative and to explore the forefront of that knowledge. We seek participation from as wide a variety of approaches as possible, including not only AI researchers and technologists, but also psychologists, cognitive scientists, linguists, philosophers, narrative theorists, anthropologists, educators, storytellers, and neuroscientists.

November 03, 2010

Of all the math problems I collect, these are my favorites. They do not require anything more than elementary math, but they do seem to trigger a software bug in most people's brains. Choose between several different arguments that lead to different answers for each problem. (Updated Nov 2010: two siblings problem)

(Two siblings) If you pick a random family with two kids and calculate the probability of both being girls the obvious answer would be 1/4 (assuming girls and boys being equally likely). However simple variations of this problem easily lead to some confusion:

Variation 1: If you ask the family whether they have at least one girl, and they say yes, the two girl probability is 1/3.

Variation 2: If you see one of their kids on the street and notice that she is a girl, the two girl probability is 1/2.

You can verify these answers by imagining the sample space of all (say four million) two-child families and assuming equal numbers of boy-boy, boy-girl, girl-boy, and girl-girl families (say one million each). What is tricky to understand is why these two variations have different answers when it seems like they give you the exact same information. Here are some more variations:

Variation 3: If you learn that the older sibling is a girl, the two girl probability is 1/2.

Variation 4: If you learn that the family has one girl named Florida, the two girl probability is approximately 1/2.

Variation 5: If you learn that the family has one girl born on a Wednesday, the two girl probability is 13/27.

(Umit - Monty Hall Problem) You are a participant on the game show "Let's Make a Deal." Monty Hall shows you three closed doors. He tells you that two of the closed doors have a goat behind them and that one of the doors has a new car behind it. You pick one door, but before you open it, Monty opens one of the two remaining doors and shows that it hides a goat. He then offers you a chance to switch doors with the remaining closed door. Is it to your advantage to do so?

Argument 1: It does not matter. The probability of finding the car in the remaining two doors was equal in the beginning, and they are still equal now. The fact that you put your hand on one of them cannot increase or decrease its probability of having the car under it.

Argument 2: If we repeated this experiment a million times, you would get the the car only one third of the time by sticking to your first door. People who consistently switch would win the other two thirds. Therefore you should switch.

Argument 3: Think about what you would do if there were a thousand doors, rather than three, and Monty Hall opened 998 doors with goats behind them.

(Encyclopedia of Bridge) You are South with three small of a suit, and dummy has QJ9. You desperately need a trick from this suit. You lead low to the Queen, and East wins with the King. When you get a second chance, you lead low to the J9 and West plays low. Should you play the Jack or the 9?

Argument 1: If either opponent has A10, it does not matter. If East has the Ace and West the 10, you want to play the 9. If it is the other way around, you want to play the Jack. Both sides are equally likely to have the Ace so it does not matter what you play.

Argument 2: You should play the Jack because East has the Ace only 1/3 of the time. If East had AK, he would play the King to the first trick only half the time. If he had K10, he would always play the King. Since we know he played the King, it is twice as likely that he has the K10 and not AK.

Note: Note the similarity with the Monty Hall problem.

(Memduh - Two envelope problem) I offer you a pick between two envelopes with money. One envelope has twice as much money as the other. You pick one, and out comes 10 dollars. Now I give you a chance to switch. Would you like to switch? How much are you willing to pay to switch?

Argument 1: Of course you switch. The expected amount of money in the other envelope is 0.5x5 + 0.5x20 = 12.5 dollars. In fact you are willing to pay up to 2.5 dollars to switch.

Argument 2: What if I asked you the question before you opened the envelope and saw the 10 dollars? Using the same reasoning, you can assume there is A dollars in the envelope and compute 0.5x(A/2) + 0.5x(2A) = 1.25A for the expected money in the other envelope. So you would switch. Just before you open your new envelope, I ask you whether you would like to switch again? What would your answer be?

Note: In fact if I can find two people that believe in Argument 1, I can build a money machine. Just keep giving them two envelopes with 5 and 10 dollars and charge for switching... :^) (Of course I charge them whatever comes out of the first envelope for playing the game, so that it is a zero sum game.)

(Neal) I pick two real numbers. You look at one of them. Can you find a strategy that lets you guess whether you are looking at the larger or smaller number with more than 1/2 success rate.

Argument 1: Obviously you cannot find such a strategy.

Argument 2: Take a probability distribution that is non-zero over all the real numbers (standard normal for example). Draw a random number from this distribution and respond assuming that the hidden number is equal to your random number. There are three cases: (i) Your random number will be smaller than both my numbers, in which case you have 50% chance of winning. (ii) Your random number will be larger than both my numbers, in which case you have 50% chance of winning. (iii) Your random number will be between my two numbers, in which case you have 100% chance of winning. The average is greater than 50%.

Note: Using a similar argument one can show that you could in fact make a profit in the two envelope problem by employing a mixed strategy.

(Alkan) You draw a random line that cuts a circle with unit radius. What is the probability that the chord will be longer than sqrt(3)?

Argument 1: Consider the distance between the midpoint of the chord and the center of the circle. If this distance is less than 1/2 the chord will be longer than sqrt(3). Therefore the answer is 1/2.

Argument 2: Draw a tangent at one of the points the line intersects the circle. Consider the angle between this tangent and the chord. If this angle is between 60 and 120 degrees, the chord will be longer than sqrt(3). Therefore the answer is 1/3.

Argument 3: Consider the midpoint of the chord. If this midpoint is within a concentric circle with half the radius, the chord will be longer than sqrt(3). The area of a circle with half the radius is 1/4th of the original. Therefore the answer is 1/4.

October 27, 2010

A. Kabakçıoğlu, D. Yuret, M. Gür, B. Erman. 2010 Phys. Biol. 7 046005 doi: 10.1088/1478-3975/7/4/046005 (PDF, PDF, HTML, Online, Hermite code)Abstract: We develop a general framework for the analysis of residue fluctuations that simultaneously incorporates anharmonicity and mode-coupling in a unified formalism. We show that both deviations from the Gaussian model are important for modeling the multidimensional energy landscape of the protein Crambin (1EJG) in the vicinity of its native state. The effect of anharmonicity and mode-coupling on the fluctuational entropy is in the order of a few percent.

September 19, 2010

- Human language is qualitatively distinct from animal communication systems (ACSs): most words are incomplete by themselves and need to be combined to express meaning whereas animal calls are self sufficient.

- A better model for origins of language may be pidgins (languages created by people that live together but do not share a common language).

- A protolanguage would have words as we know them that combine but no well defined rules of syntax or morphology.

- ACSs are all about the here and now, most words refer to things outside the current happenings.

- ACSs are mainly manipulative whereas language mainly informative.

- Communicative units come in three flavors: icons resemble the thing talked about, indices point to them, and symbols do neither. (function words which do not refer at all form a separate group). Displacement (referring to things that are not in the here/now) is only possible with symbols and icons. Iconic signs may have been the first displaced ones paving the way for symbols.

- "Since we usually regard language as no more than the means by which we express our thoughts, it seems natural to think that language should issue from intelligence, rather than vice versa. It seemed equally obvious, to naive observers, that the earth was the center of the universe, and the sun, moon, and planets all went around it." pp.58

- Categories are different from concepts. pp.87. Concepts you can think about or think with, whereas all you can do with categories is to tell whether something belongs in them. pp.205

- The ACSs of ants and bees may be closer to humans because they exhibit displacement (about food sources distant in time and space). ch.7

September 04, 2010

I have looked for a way to get alerted about new releases from my favorite authors for a long time. I think Amazon used to support this in the past but they no longer do. Barnes and Noble has writer alerts for only a small list of authors. This is such an obvious feature for a bibliophile that I do not understand why nobody supports it. I was about to write my own code but luckily ran into www.authoralerts.com first. Highly recommended. Full post... Related link

August 26, 2010

Mehmet Ali Yatbaz and Deniz Yuret. Coling 2010. pp. 1391--1398. Beijing, China. (PDF, Poster)Abstract: We show that unsupervised part of speech tagging performance can be significantly improved using likely substitutes for target words given by a statistical language model. We choose unambiguous substitutes for each occurrence of an ambiguous target word based on its context. The part of speech tags for the unambiguous substitutes are then used to filter the entry for the target word in the word--tag dictionary. A standard HMM model trained using the filtered dictionary achieves 92.25% accuracy on a standard 24,000 word corpus.

Abstract: Parser Evaluation using Textual Entailments (PETE) is a shared task in the SemEval-2010 Evaluation Exercises on Semantic Evaluation. The task involves recognizing textual entailments based on syntactic information alone. PETE introduces a new parser evaluation scheme that is formalism independent, less prone to annotation error, and focused on semantically relevant distinctions.

Ergun Bicici, Deniz Yuret. Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR. pp. 282--289. July 2010. Uppsala, Sweden. (PDF, Slide, Poster)Abstract: We use L1 regularized transductive regression to learn mappings between source and target features of the training sets derived for each test sentence and use these mappings to rerank translation outputs. We compare the effectiveness of L1 regularization techniques for regression to learn mappings between features given in a sparse feature matrix. The results show the effectiveness of using L1 regularization versus L2 used in ridge regression. We show that regression mapping is effective in reranking translation outputs and in selecting the best system combinations with encouraging results on different language pairs.

April 22, 2010

Smolin's book made me think of a dilemma I face often. I find the current system of scientific funding disturbing. Chief among the "values" of a scientist is absolute honesty. Yet, the project proposals we need to fill periodically ask us to describe what we are going to do in detail for the next three years. I don't know what I am going to do during the next three weeks! It depends on what results I am going to get using my current approach during the next couple of days. Maybe I will have a brilliant idea that will change my whole approach to the problem. Maybe I will be taken over with another problem. Honestly I don't know. The only thing I can promise is that I will put all my working energy on making progress on the problem that I find most promising at the time. But apparently that is not enough to get funding, and we are forced to either (i) bend the truth, or (ii) tie ourselves to an approach that we will most likely find suboptimal in the near future.

To me the answer is simple: scientists should be funded not on promises about the future (which nobody can honestly make, let alone scientists whose job is to explore the unknown), but on past performance. That leaves the problem of young scientists who have no past. There should be a reasonable amount of seed funding for such people, just enough to make sure an adventurous spirit has enough time to risk his career tackling an important and deep problem.

Smolin's book should be required reading by all who manage scientists and scientific funding. If you are not interested in the debate on string theory, just read the last few chapters on how science works based on a shared ethic, and why we should take a bit more risk on "Seers" who tend to obsess about high risk problems and may take a long time (sometimes forever) producing anything valuable.

Chapter 17 proposes the shared ethic among scientists rather than some abstract "scientific method" as chiefly responsible for the success of science. Chapter 18 draws a distinction between two types of scientists "Seers" and "Craftspeople". In fact pretty much the whole book is an elaboration of how and why the scientific establishment does not provide enough room for "Seers" who by nature like to obsess about high risk problems and need much longer incubation times.

I find the shared ethic of science to be one of the most important creations of human culture. I had long held the view that science was about "what is" and not about "what ought to be", thus science and ethics had nothing to do each other. Recently I started to see the ethic of scientists as people, if not the result of their work, as being very relevant. Dennis Overbye describes it best:

"Not only does science not provide any values of its own, say its detractors, it also undermines the ones we already have, devaluing anything it can’t measure, reducing sunsets to wavelengths and romance to jiggly hormones. It destroys myths and robs the universe of its magic and mystery. So the story goes. But this is balderdash. Science is not a monument of received Truth but something that people do to look for truth. That endeavor, which has transformed the world in the last few centuries, does indeed teach values. Those values, among others, are honesty, doubt, respect for evidence, openness, accountability and tolerance and indeed hunger for opposing points of view."

February 22, 2010

The purpose of this post is to encourage participation in the task "Parser Evaluation using Textual Entailments" in the 5th International Workshop on Semantic Evaluations, SemEval-2010 (http://semeval2.fbk.eu/semeval2.php) collocated with ACL-2010, July15-16, Uppsala.

This shared task should be of interest to researchers working on * parsing * semantic role labeling * recognizing textual entailments

Parser Evaluation using Textual Entailments (PETE) is a shared task in the SemEval-2010 Evaluation Exercises on Semantic Evaluation. The task involves recognizing textual entailments (RTE) based on syntactic information. Given two text fragments called 'Text' and 'Hypothesis', Textual Entailment Recognition is the task of determining whether the meaning of the Hypothesis is entailed (can be inferred) from the Text. The PETE task focuses on entailments that can be inferred using syntactic information alone.

Text: The man with the hat was tired.

Hypothesis-1: The man was tired. (YES)

Hypothesis-2: The hat was tired. (NO)

Our goals in introducing this task are:

To focus parser evaluation on semantically relevant phenomena.

To introduce a parser evaluation scheme that is formalism independent.

To introduce a targeted textual entailment task focused on a single linguistic competence.

To be able to collect high quality evaluation data from untrained annotators.

The following criteria were used when constructing the entailments:

They should be decidable using only syntactic inference.

They should be easy to decide by untrained annotators.

They should be challenging for state of the art parsers.

You can find more details about our entailment generation process in the PETE Guide. You can download the development and test datasets including gold answers and system scores here: PETE_gold.zip. There is no training data. The evaluation is similar to other RTE tasks. There is a Google group semeval-pete for task related messages.

Here are some links for publicly available parsers that can be used in this task. You do not have to use any of these parsers, in fact you do not have to use a conventional parsing algorithm at all -- outside the box approaches are highly encouraged. However, to get a quick baseline system using an existing parser may be a good way to start.

a blog post: On the consistency of Penn Treebank annotation (October, 2006).

lre98.pdf: Carroll, J., E. Briscoe and A. Sanfilippo (1998) `Parser evaluation: a survey and a new proposal'. In Proceedings of the 1st International Conference on Language Resources and Evaluation, Granada, Spain. 447-454.

AbstractWe introduce a preprocessing technique for classification problems based on linear transformations. The algorithm incrementally constructs a linear transformation that maximizes the nearest neighbor classification accuracy on the training set. At each iteration the algorithm picks a point in the dataset, and computes a transformation that moves the point closer to points in its own class and/or away from points in other classes. The composition of the resulting linear transformations lead to statistically significant improvements in instance based learning algorithms.

AbstractSparse feature representations can be used in various domains. We compare the effectiveness of $L_1$ regularization techniques for regression to learn mappings between features given in a sparse feature matrix. We apply these techniques for learning word alignments commonly used for machine translation. The performance of the learned mappings are measured using the phrase table generated on a larger corpus by a state of the art word aligner. The results show the effectiveness of using $L_1$ regularization versus $L_2$ used in ridge regression.

February 19, 2010

Abstract: We introduce a generative probabilistic model, the noisy channel model, for unsupervised word sense disambiguation. In our model, each context C is modeled as a distinct channel through which the speaker intends to transmit a particular meaning S using a possibly ambiguous word W. To reconstruct the intended meaning the hearer uses the distribution of possible meanings in the given context P(S|C) and possible words that can express each meaning P(W|S). We assume P(W|S) is independent of the context and estimate it using WordNet sense frequencies. The main problem of unsupervised WSD is estimating context dependent P(S|C) without access to any sense tagged text. We show one way to solve this problem using a statistical language model based on large amounts of untagged text. Our model uses coarse-grained semantic classes for S internally and we explore the effect of using different levels of granularity on WSD performance. The system outputs fine grained senses for evaluation and its performance on noun disambiguation is better than most previously reported unsupervised systems and close to the best supervised systems.

February 17, 2010

This is for people trying to type Turkish documents on a U.S. keyboard using Emacs. The program provides a turkish-mode in which the correct Turkish accents are added to the ascii version of the last word typed each time the user hits space. The latest version is available here.