The following are the four criteria used to determine
whether materials fall under the provisions of the Fair Use Law:

Criteria

What favors Fair Use status

The Corpus of Contemporary
American English

The amount and substantiality
of the portion taken

Small portions of the original
text, rather than full-text access

Under no circumstances
whatsoever do end users have access to entire texts (e.g.
newspaper, magazine, or journal articles, or short stories).
All access is via the web interface, and the vast majority
of what users see are simply frequency charts showing the
frequency of words or phrases in different parts of the
corpus. Access to small portions of the original text is
more of an "afterthought", rather than the central feature
of the interface.

Access to actual portions of
the original text is limited to very short "Keyword in
Context" displays, where users see just a handful of words
to the left and the right of the word(s) searched for. In
addition, all access is logged, and users can only perform a
limited number of searches per day. As a result, it would be
difficult for end users to re-create even one paragraph from
the original text, and it would be virtually impossible
to re-create an entire page of text, much less the entire
article.

This "snippet defense" (which
relies on limited access to the original text via small
snippets from the web interface) is the same one used by
Google Books for its use of millions of
copyrighted materials. In addition, we have consulted two
lawyers who specialize in Internet copyright law (names
available upon request). They have both stated that because
of our limited access to end users, as well as our status
with regards to the other three factors shown here, we are
clearly in accord with the provisions of the Fair Use
statute.

The purpose and character of
the use

Academic, non-commercial

Our use of the texts is
strictly for academic research, and is purely
non-commercial.

The nature of the copyrighted
work

Non-creative works

There are some creative works
(e.g. short stories and small sections of novels) in the
corpus, but more than 80% of the corpus is composed of
transcripts of TV shows, and articles from newspapers,
magazines, and academic journals.

The effect of the use upon the
potential market

Little or no effect on the
copyright holder

Because of the very limited
access via our web interface (see the first item above), it
is extremely unlikely that anyone would use this corpus as a
"substitute" for other access to the original texts. Other
sources make these texts available as "complete articles",
which are meant to be read in their entirety. That is
completely impossible with our interface.

Access to the texts via our
interface, as compared to access via other sources, serves
two completely different audiences. Our interface is
designed for linguists and language learners who want to see
the frequency of words, phrases, synonyms, etc., and it is
completely inadequate for anyone who wishes to read the
entire text of an article. As a result, there is very little
or no "competition" between our service and that provided by
others, and therefore virtually no market impact.

In addition to the copyright issues, there are also licensing
issues, in terms of the sources from which we obtained some of the
texts in the corpus. We were very careful, however, to retrieve the
materials over a very long period of time (four years -- 2005-2008),
so as to not violate licensing agreements on how much material could
be retrieved in a particular timeframe.