The Associated Press announced that they are changing their style guide to drop the phrase “illegal immigrant” while retaining phrases like “illegal immigration” and “entering the country illegally.”

Immigrant advocates have been fighting for this change for a long time. Part of the thinking is that the phrase illegal immigrant is essentializing; it links the concept of illegality to the people themselves.

A quick look a COCA shows that in American discourse generally, immigrants are conventionally represented as illegal.

collocate

frequency

mutual information

illegal

2887

8.30

new

1032

1.67

legal

649

4.81

other

562

0.80

undocumented

537

9.42

Collocates for lemmatized immigrant on COCA using a span of 4 to the left and right

It is telling, not only that the frequency of illegal is so much higher, but also that its Mutual Information value is so high. Mutual Information is a measure of association. In a corpus, words might have a high frequency of collocation because both words are themselves frequent. Less frequent words would have lower frequency but have a high degree of association: when they appear, they appear together. Mutual Information accounts for these variations and gives us a measure of association: How likely are these words to appear in the same neighborhood? The usual cut-off for significance is MI>3.

So illegal and immigrant (MI=8.30) have a very high degree of association. This is somewhat surprising given that illegal doesn’t seem like a particularly specialized modifier. Undocumented has a higher degree of association (MI=9.42). It’s use, however, is far more restricted. It appears only with nouns related to the movement of people across borders: workers, students, aliens, people, migrants, etc.

Another interesting feature of this debate is how recent the practice of representing immigrants as illegal is. Despite a long history of contentious discourse around immigration in the US, the results from COHA show that this specific construction is quite new.