Using online corpus tools to check intuitions

[This is the draft of a paper I wrote for a presentation at an Interpretation/Translation Conference in fall 2010. This may or may not have been based on an MA assignment.]

Abstract
Interpretation and translation students typically want and expect exact answers about the differences between two apparently similar words. Teachers, in turn might rely on intuitions and feelings to answer tricky questions that might come up in interpretation and translation classes. Intuition is not always helpful, however, and can sometimes cause confusion or supply inaccurate information. As it relies on only one person, often the teacher, this runs the risk of being the nothing more than just a personal opinion. A way to check, support, contradict and verify intuitions exists. This can be done through the use of online corpus tools. Teachers and students can find accurate information about how words are used and with what words they are likely to co-occur. These corpus tools are extremely powerful and relatively easy to use. They are a tool that students and teachers should be able to use for basic research into words and how they are used. This paper will display the information that can be gathered from such tools. The question, “What is the difference between ‘trouble’ and ‘problems?’” is used demonstrate the tools and the information they can provide. The interactive presentation will demonstrate these tools with a new set of similar questions. Additionally, advice and warnings on using these tools as well as an evaluation of each tool will be provided during the presentation.

Introduction
As above, translation and interpretation students often want to know as much as possible about words. Firth, wrote that, “You shall know a word by the company it keeps” which means that the words that typically fall around words give us a lot of information about words. (Firth, 179) Firth argued that the meaning of a word is as much related to how it combines with other words, its collocations, in real use as it is of the meaning it possesses in and of itself. Collocations are defined by Sinclair as a combination of words that show a tendency to co-occur, which is to occur near each other in natural language. (Sinclair 1991) Firth wrote, “Collocations of a given word are statements of the habitual or customary places of that word.” (Firth 181) This concept of collocations has become more recognized and important in recent years as teachers are more cognizant of the importance of collocations and share their insights on how words tend to collocate. Teachers are often excellent sources of information about collocations. Sometimes, however, teachers’ intuitions are not always very reliable and we need to find an alternative source of information. This is where online corpus tools can be of great use. A corpus is defined as, “collection of texts, written or spoken, which is stored on a computer.” (O’Keefe 1) Corpora can tell us about word frequency, grammar, vocabulary, relational language, idioms, and chunks. Online tools that use corpus data can be an excellent way to check the intuitions of teachers. In addition to providing empirical and quantitative data that might elude intuition, corpus tools can also be a useful way to provide examples that might take even the most capable teachers a long time to create. As the benefits of online corpora have now been detailed, the sections that follow will deal with such tools and how to use them.

Procedure
What follows is a study into the difference between “trouble” and “problems” using four online tools. These tools can be used to gather useful information about the meaning and uses of words. The information provided by each of the websites and the conclusions that can be drawn from it are also highlighted. Space constraints prevent a thorough evaluation of the websites themselves but this will be covered in the presentation itself.

In order to get an example of teacher intuitions four native English teachers were asked, “What is the difference between trouble and problems?” The teachers, each with Master’s degrees in TESOL or Applied Linguistics were not given time to think about the answer and were expected to answer spontaneously. The answers that they gave were (hopefully) not terrible but they might not the type of answers that translation and interpretation students would want. One teacher said “trouble is more general.” Another mentioned that trouble sounds more serious because we can solve problems. A third surmised that the meaning is the same but that it is a question of collocation, echoing the Firthian tradition. The final teacher said that this was a good question, that the words are similar in meaning and that it is easy to see how one could get the words confused. He continued, “Of course, problem and trouble are certainly used for bad things. I think that I would often use problem to talk about certain specific issues but would use trouble in a more general way. I might say ‘He is having trouble speaking only English in class because he has a problem remembering vocabulary.’ As we can see from the answers, the teachers relied on their intuitions, experiences, and feelings to answer the question. To check these intuitions, and show the type of information available online for teachers and students the question of trouble vs. problems was examined in greater detail through online corpus tools.

The first step online was to check the “Just the Word” website, which rated problem and trouble as very similar in meaning. This is probably a big part of the confusion. Hitch, snag, bother, complication, difficulty, nuisance, pest, enigma, inconvenience, puzzle, plague, and (surprisingly for me) enigma were also listed as similar words for problem. Many of these words might be connected to the math problem aspect while others show a limited time frame. The list of similar words for trouble was much longer and more varied. Examples of words rated as most similar to trouble are (in alphabetical order because they were all rated as most similar) adversity, affliction, agitation, bother, catastrophe, commotion, difficulty, dilemma, disaster, distress, disturbance, fix, hardship, hassle, inconvenience, jam, mess, misery, misfortune, nuisance, ordeal, pain, pest, pickle, plague, plight, predicament, quandary, scrape, sorrow, suffering, torment, trial, tribulation, upset, vexation, woe, and worry. We can feel that trouble is something that might have to be endured from words like adversity, misery, bother, torment, hassle, suffering, ordeal, hardship, trial, and tribulation. Other words like disaster, plague, misfortune, catastrophe convey the sense that trouble is something that just springs up with bad luck. Pickle, jam, mess, fix, and hassle give the meaning of a tough situation to be in.

The next step, also on “Just the Word,” was to check for common grammatical patterns and collocations. By far the most common combination for trouble, with 1246 instances was “trouble + be” as in, “The trouble is that I am not a smoker.” This was listed in the “N* subj V” category, which means that the noun being searched, trouble, is the subject of the sentence. Another combination, “V obj N*”, which means that trouble is the object of the verb, provided some telling and useful results. “Have trouble” was the most common in this grouping with 902 entries. Take trouble, cause trouble and get into trouble were next with 270, 248, and 230 entries respectively. The preposition that followed trouble most often was “with”, as in “You’re going to get me into trouble with the boss.” This example sentence is especially helpful because it also shows “trouble” with the preposition most likely to precede it, “into.” There were 471 examples of “into trouble” and 813 for “trouble with.” The preposition most likely to follow “trouble” is “by”, with 200 entries.
Like “trouble”, the most common grammatical structure for “problem” is “problem +be” in the “N* subj V” section. There were 6418 such examples. For comparison, the second most common verb to match with “problem” as the subject was “arise” with 641 entries. With “problem” as the object of the verb, “have” was the most common collocation by a wide margin, with 2756 entries. Other verbs that frequently appeared with “problem” as the object were “solve” with 1673 entries, “cause” with 879, “deal with” at 569, “present” with 439, “overcome” at 427, ”pose” at 406, “resolve” amounting to 366, “discuss” with 274, and “address” at 258 listings. The adjectives that modified “problem” the most were “economic”, “big”, “particular”, “main”, “social”, “serious”, and “major” in increasing order with 307, 355, 405, 413, 454, 542, and 688 examples. The combination “problem with” shows the word in question with the preposition most likely to follow it, which tallied 2389 entries. The prepositions most likely to appear in front of “problem” are “with” and “to” with 2329 and 2827 entries, respectively. The noun and preposition combination most likely to precede “problem” is “solution to” with 727 entries.

We can see from this data that problems are things to have, talk about, categorize, rank, face, and perhaps eventually solve. This contrasts with troubles which we are most likely to cause and/or get into and then have. Examples like “major problem”, “serious problem”, and “social problem” seem to show the tendency of “problem” to appear in more academic and formal texts.

The second site checked was “Sketch Engine.” We learned above that a very common construction using trouble is “to have trouble”. “Sketch Engine” shows that the two most common yet unfortunate things to have trouble with are the law and police. Teething, which is a relatively rare word, strongly collocates with trouble. In descending order, other modifiers that collocate with strong statistical significance with trouble are serious, hamstring, only, deep, real, terrible, financial, crowd, relegation (likely related to football/soccer as Sketch Engine uses the British National Corpus engine), further, dire, and heart. This can be compared with modifiers like serious, major, intractable, real, teething, pressing, particular, methodological, main, biggest, technical, severe, behavioral, environmental, practical, fundamental, insoluble, free-rider, marital, and potential that are ranked in terms of statistical significance with serious being the most significant. It is interesting to note that teething correlates strongly with both trouble and problem. Trouble modifies spot, brewing, maker, and shooting in a statistically significant fashion. In the same way, solving, solver, area, behavior, drinker, drug, page, administrator, and report are words that are modified by problem. “Spell trouble” is a very highly collocated verb noun combination. “Ask for trouble” is a strong collocation as well.

The “Sketch Difference” function on Sketch Engine shows that problem is a much more common word that trouble with 55,745 instances to 9,441 for trouble. This function also allows users to note that the word “cause” as in “cause trouble” and “cause problems” is commonly used with both words. Another word that is used with both trouble and problem is “run” as in “run into problems” or “run into trouble.” Although less statically significant than the previous examples, the word “financial” also collocates nearly equally with both problem and trouble.

Sketch Engine also shows that solve, approach, deal, cope, grapple, solve and solution are all words that strongly collocate with problem but not trouble. This conveys the idea that trouble is not something that we can solve. Words that strongly collocate with “trouble” but not “problem” include brew, ask, head, spell, and smell. Perhaps we can surmise that trouble has a smell, while problems do not. Interestingly, the given examples for “head”, “brew”, and “ask” all used the verbs progressively, as in “You are heading for trouble.”

“COCA”, “The Corpus of American Contemporary English” , can be used to uncover words most commonly used around “trouble” and “problem”. The first search was for words that collocated with very strongly with “trouble” but not “problem”, from as far as four words away from the target word on either side.

Words that collocated very strongly with trouble but not problem were:

1) heap (34 instances to 0)

11) spots

2) worth

12) brewing

3) daddy

13) trouble

4) paradise

14) breathing

5) toil

15) signs

6) sleeping

16) sign

7) walking

17) spot

8) civilization

18) expense

9) communicating

19) hint

10) drawing

20) lot

Words that correlated strongly with problem but not trouble include:

1) solving (1643 instances to 0)

11) corruption

2) solution

12) crux

3) refugee

13) disposal

4) solvers

14) seriousness

5) solver

15) identification

6) drinkers

16) representation

7) example

17) severity

8) magnitude

18) substance

9) racism

19) attempt

10) warming

20) scale

The results, based on the places away from the target words, were illuminating. In this way we can find words that collocate strongly with trouble but not problem and vice versa. This is a clear way to find the differences in how these words were used.

MI refers to Mutual Information and is a score for collocates. An MI of over three usually shows a “semantic bonding” between the words. Rated for relevance but disregarding the other word, the words that most likely to precede “trouble” and “problem” are:’

Trouble

Problem

1) having

8.57 MI

1) internalizing

10.38 MI

2) potential

7.00

2) creative

8.34

3) had

5.26

3) cooperative

7.15

4) have

4.78

4) reducing

6.26

5) no

4.21

5) total

6.25

6) has

4.10

6) target

5.55

7) little

3.94

7) identify

5.48

8) any

3.82

8) potential

5.43

9) much

3.75

9) biggest

5.09

10) other

3.07

10) reduce

5.06

Similarly, the words that tend to follow “trouble” and “problem” are:

Trouble

Problem

1) spots

12.51 MI

1) solvers

15.56 MI

2) brewing

12.42

2) solver

15.53

3) sleeping

11.63

3) solving

14.49

4) breathing

11.55

4) drinkers

12.56

5) communicating

10.16

5) drinker

11.72

6) recruiting

9.97

6) gamblers

11.57

7) spot

9.72

7) behaviors

11.01

8) understanding

9.03

8) real-estate

9.32

9) maker

8.77

9) behavior

9.32

10) makers

8.52

10) severity

9.17

Disregarding statistical relevance, here are the words that follow “trouble” and “problem” most frequently:

Trouble

Problem

1) spots

208 instances

1) solving

1605 instances

2) sleeping

195

2) behavior

445

3) breathing

116

3) behaviors

400

4) spot

74

4) area

272

5) understanding

71

5) solver

116

6) brewing

31

6) solvers

105

7) reading

31

7) areas

78

8) walking

29

8) drinkers

78

9) right

23

9) scores

63

10) raising

18

10) is

62

The notion that problems are there to be solved comes out very clearly when the collocates are examined. While certainly a much more common word, problem is an even more common word in academic contexts. The scientific and mathematical uses of the term might be the reason for this.

The final step was to match the results to a dictionary. For “trouble” the online Cambridge Advanced Learner’s Dictionary offered the following choices to search:

trouble (difficulties), trouble (inconvenience), trouble (worry), double trouble, trouble spot, be asking for it/trouble, lay up trouble for yourself, be looking for, trouble, be a recipe for disaster/trouble/success, store up trouble/problems, be, asking for trouble, spell trouble, get somebody into trouble, more trouble than it’s worth

The following definitions for trouble were supplied by the regular Learner’s Dictionary:1 PROBLEMS [C,U] problems, difficulties, or worries [+ doing sth] We had trouble finding somewhere to park. 2 the trouble with sb/sth used to say what is wrong with someone or something 3 NOT WORKING [U] a problem that you have with a machine or part of your body 4 FIGHTING [U] a situation in which people are fighting or arguing 5 DIFFICULT SITUATION [U] a difficult or dangerous situation 6 PUNISHMENT [U] when you have done something wrong and are likely to be punished 7 EXTRA WORK [U] when you use extra time or energy to do something [+ to do sth]

There were quite a few less choices in the Advanced Learner’s Dictionary for problem, which is a more common word than trouble. The options were:
problem, drinking problem, drink problem, have a problem with something/somebody , No problem., A problem shared is a problem halved.

The Cambridge Learner’s Dictionary provided the following definitions:1 DIFFICULT SITUATION [C] a situation that causes difficulties and that needs to be dealt with 2 MATHEMATICS [C] a question that you use mathematics to solve 3 have a problem with sth/sb to find something or someone annoying or offensive

4 No problem. INFORMAL

a AFTER QUESTION something that you say to mean you can or will do what someone has asked you to do

b AFTER THANKS something that you say when someone has thanked you for something

The information provided here seems to match with that gleaned from the other websites. This is good news. It is also likely a result of the Cambridge dictionaries using corpora to inform their dictionary entries.

Conclusion
As can be seen above, online corpus tools can provide a plethora of information to students and teachers. Using such tools can be extremely helpful in providing a more accurate, broader, and clearer picture of what words mean and how they are used. While this was an extensive process that provided a great deal of information it needn’t be so time consuming. Corpus tools such as these can aid teachers and students of interpretation and translation as they try to find better understandings of and better ways to use specific words. Of course, corpus tools are not a panacea and need to be treated as just another tool at the disposal of the teacher to be used critically in ways that can best help students.