NLP研究：Smaranda和Nina隶属通讯与信息学院(School of Communication and Information)的SALTS(Laboratory for the Study of Applied Language Technology and Society)实验室。他们不属于计算机专业。Smaranda主要做自然语言处理方面的工作，包括机器翻译、信息抽取和语义学。Nina虽然之前从事计算语义学研究，但是目前更专注于认知方向的研究。Matt Stone是计算机专业的，从事形式语义（formal semantics）和多模态交流（multimodal communication）的研究。

[https://www.kaggle.com/c/asap-aes/data] : For this competition, there are eight essay sets. Each of the sets of essays was generated from a single prompt. Selected essays range from an average length of 150 to 550 words per response. Some of the essays are dependent upon source information and others are not. All responses were written by students ranging in grade levels from Grade 7 to Grade 10. All essays were hand graded and were double-scored. 100 MB

ASAP Short Answer Scoring Kaggle

[https://www.kaggle.com/c/asap-sas/data] : Each of the data sets was generated from a single prompt. Selected responses have an average length of 50 words per response. Some of the essays are dependent upon source information and others are not. All responses were written by students primarily in Grade 10. All responses were hand graded and were double-scored. 35 MB

[http://trec.nist.gov/data/reuters/reuters.html] : a large collection of Reuters News stories for use in research and development of natural language processing, information retrieval, and machine learning systems. This corpus, known as "Reuters Corpus, Volume 1" or RCV1, is significantly larger than the older, well-known Reuters-21578 collection heavily used in the text classification community. Need to sign agreement and sent per post to obtain. 2.5 GB

[https://www.crowdflower.com/data-for-everyone/] : Before the 2015 Super Bowl, there was a great deal of chatter around deflated footballs and whether the Patriots cheated. This data set looks at Twitter sentiment on important days during the scandal to gauge public sentiment about the whole ordeal. 2 MB

Twitter sentiment analysis: Self-driving cars

[https://www.crowdflower.com/data-for-everyone/] : contributors read tweets and classified them as very positive, slightly positive, neutral, slightly negative, or very negative. They were also prompted asked to mark if the tweet was not relevant to self-driving cars. 1 MB

[https://www.kaggle.com/crowdflower/twitter-airline-sentiment] : A sentiment analysis job about the problems of each major U.S. airline. Twitter data was scraped from February of 2015 and contributors were asked to first classify positive, negative, and neutral tweets, followed by categorizing negative reasons [such as "late flight" or "rude service"]. 2.5 MB