A
review of pornography use research: Methodology and results from
four sources

Michael Gmeiner1,
Joseph Price2,
Michael Worley3

1,2,3
Brigham Young University, Provo, Utah, United States

Abstract

The widespread
electronic transmission of pornography allows for a variety of new
data sources to objectively measure pornography use. Recent studies
have begun to use these data to rank order US states by per capita
online pornography use and to identify the determinants of
pornography use at the state level. The aim of this paper is to
compare two previous methodologies for evaluating pornography use by
state, as well as to measure online pornography use using multiple
data sources. We find that state-level rankings from Pornhub.com,
Google Trends, and the New Family Structures Survey are
significantly correlated with each other. In contrast, we find that
rankings based on data from a single large paid subscription
pornography website has no significant correlation with rankings
based on the other three data sources. Since so much of online
pornography is accessed for free,research
based solely on paid subscription data may yield misleading
conclusions.

Introduction

While
most researchers would agree that pornography has become more
pervasive in recent decades, the accurate measurement of the level
of pornography use in the population remains an empirical challenge
for social scientists. The array of technologies used to access
pornography has changed over time, making it almost impossible to
consistently measure the same metric of pornography use. High-speed
internet, which has penetrated markets gradually over the last
fifteen years, enables unprecedented affordability, anonymity, and
ease of access in pornography consumption (Cooper, 1998),
contributing to the apparent general rise in pornography use
(Wright, 2011). Hertlein and Stevenson (2010) also note other
features particular to broadband internet pornography in
contributing to growth of the industry: closer approximation to the
physical world, acceptability, ambiguity, and accommodation between
one’s “real” and “ought” self.

Past
approaches to pornography use measurement have relied heavily on
survey data (see Buzzell, 2005). The electronic nature of online
pornography, however, increasingly makes possible a number of
alternative methods for obtaining reliable proxies of pornography
use, including those gathered from subscription or online search
data. The ability to use an objective measure based on subscription
or search data is advantageous since survey-based data generally
suffers from a social desirability bias: respondents may underreport
activities that violate social norms (Fisher, 1993). In addition,
subscription data does not depend on an individual’s opinion about
what constitutes pornography; a natural limitation of subjective
survey questions about pornography use.

Two
recent studies have tapped into innovative sources of data about
online pornography use. Edelman (2009) uses subscription data from a
single top-ten provider of paid pornographic content to create a
ranking of which states use the most online pornography and
correlates these with several state-level measures of social or
religious attitudes. MacInnis and Hodson (2014) use Google Trends
search term data as a proxy for pornography use and examine the
relationship between state-level pornography use and measures of
religiosity and conservatism. They find that states with more
right-leaning ideological attitudes have higher rates of
pornography-related Google searches.

This
paper assesses some of the claims made in past studies about the
rank order of states and the relationship between state-level
pornography use and various state-level social measures. We also
give a framework that future researchers can use to assess the
representativeness of future state-level or even county-level
datasets about pornography use. Edelman (2009) was a pioneer in
accessing the subscription data of a single provider of paid
pornographic content and this use of individual consumer data from
private companies will become a useful tool for gathering data on
hard-to-measure behavior. Key for the future use of this type of
rich data will be identifying the degree to which the data from a
single firm can provide the same insights as a nationally
representative sample.

In
this paper, we expand on the data used in these two recent studies
and combine it with two additional data sources. Since each of the
four data sources we use in this paper yields a measure of the level
of pornography use, we estimate the validity of each source by
comparing it against the state-level rankings that we obtain for the
other sources.

Data

Our
paper draws on four data sources that include information on
state-level variation in pornography use. The first two data sources
are nationally representative samples while the last two are based
on paid subscriptions or page views connected to a specific provider
of pornographic content. In each data source our measure of
pornography use is based on circumstances in which individuals seek
out pornographic content rather than accidentally viewing
pornography.

Our
first dataset is based on a nationally representative sample of
2,988 respondents in the New Family Structures Survey (NFSS). The
data collection was conducted by Knowledge Networks (KN), a research
firm with a record of generating high-quality data. Knowledge
Networks recruited members of its panel randomly by telephone and
mail surveys, households are provided with internet access if
needed. This panel has advantages in that it is not limited to
current Internet users or computer owners, and does not accept
self-selected volunteers.

The
NFSS includes a question about whether the respondent intentionally
viewed pornography in the previous year. This type of question has
the advantage of capturing pornography use across whatever source
the individual is using to access. There are other nationally
representative samples such as the General Social Survey that
include pornography questions. We use the data from the NFSS because
it can be easily accessed by other scholars and includes state
identifiers in its publically available form. In contrast, state
identifiers can only be obtained in the confidential version of the
General Social Survey. For the analysis in this paper, we use the
set of forty-six states from the NFSS survey for which there were at
least 50 respondents.

The
second data source, Google Trends, functions as a time series index
of the volume of searches entered into Google in a specific
geographic area. These data have proven useful in economic and
medical endeavors such as predicting influenza outbreaks (Carneiro &
Mylonakis, 2009) and forecasting short term economic indicators such
as consumer confidence or unemployment (Choi & Varian, 2012).
Preis, Moat, and Stanley (2013) quantify trading behavior using
Google Trends, showing that certain terms are linked with stock
value increasing or decreasing. The adult entertainment industry can
likewise be examined by using Google Trends search data to the
extent that important features of its industry can be measured
quantitatively.

The
most important challenge in using Google Trends data is selecting
the specific terms on which we draw data. The terms selected must be
an actual indicator of pornography use for our analysis to be
useful. Ho and Watters (2004) analyzed structural trends in
pornographic websites. As part of their analysis they create a list
of terms which appear frequently on pornographic websites and which
frequently fail to appear on non-pornographic websites. The top four
terms were “porn”, “xxx”, “sex”, and “f***”. Using
search statistics we find that searches for these four terms are
highly correlated. In contrast, searches of the term “pornography”
are uncorrelated with any of these four terms and is a term that is
likely to be used by people seeking information about pornography
rather than accessing actual pornographic content.

There
is also a distinction between “hard” and “soft” pornography,
with “soft” generally referring to media that is sexual in
nature, but does not depict penetration. The four terms previously
listed will draw data only on users seeking hard content, but we
still consider this to be an effective analysis for two reasons.
Soft porn is not considered to be pornography by many viewers, and
as a result it is pervasive even in mainstream media, including
television and movies. Second, we find that the relative searches
for soft pornography terms are minimal in comparison to searches for
hard pornography terms. We did a relative search value for the
search terms “porn” and “nude girls” over 2005-2013.
Searches for both terms were normalized such that the maximum search
volume took on the value 100, occurring for the term “porn”. In
comparison to the normalized maximum, “nude girls” never has a
search volume index greater than 6.

The
data from Google Trends do not indicate the actual number of
searches for a specific term in a geographic area. Each data point
is normalized by dividing the number of searches for the term by the
total number of all searches in that area. The data is therefore
controlled for both population and the differences in search volume
among states. Google Trends also eliminates repeated searches by a
single individual in a short period of time to prevent a single
individual from skewing the results.

Data
are available at the state-week level from Google Trends. We use
data over the year July 2013-July 2014. Our observations are
adjusted to a 1-100 scale. A state with the highest normalized
searches of a specific term during a one week period in our dataset
has a reading of 100. Using this data on each term we construct an
index of pornography searches for each state-week of our data with a
weighted sum using the four terms. We weight “porn” and “sex”
more heavily because their relative searches are much greater than
compared to “f***”, and “xxx”. Specifically, we use the mean
relative weighting of each term over the past year. We then use this
weighted search volume ranking of states by Google Trends to
geographically model the adult entertainment industry.

One
of the advantages of using data from Google Trends as opposed to
website-specific subscription data is that it includes the
information about individuals searching out both free and paid adult
entertainment. Doran (2008) notes that about 80-90% of visitors to
pornographic websites only access free pornographic material,
suggesting that analysis of paid adult entertainment may obscure
actual patterns of pornography consumption in general.

Our
third data source records the number of subscriptions to one of the
top-ten largest providers of paid pornographic content used in a
recent study by Edelman (2009). Edelman’s analysis of this dataset
was a novel contribution to the literature; previous studies of
pornography use had only examined survey data. The specific data
used was the zip code associated with all credit card subscriptions
between 2006 and 2008. This particular content provider has hundreds
of sites covering a broad range of adult entertainment. Edelman
(2009) acknowledges, however, that “it is difficult to confirm
rigorously that this seller is representative.”

Although
the source of this subscription data is a top-10 seller of adult
entertainment, the subscriptions are very low relative to the
patterns of pornography use we observe in survey data like the NFSS,
where 47% of adults report using pornography in the last year. The
state with the most subscriptions per broadband household is Utah
with 5.47 for every 1,000 households with broadband. The lowest
state is Montana with 1.92 subscriptions for every 1,000 households
with broadband. These low rates suggest that the market share for
individual content providers of pornography is small, making it
difficult to know whether the data from one provider can provide an
accurate cross-state comparison. As mentioned before, the vast
majority of individuals who access pornography online only access
free content rather than using a paid site such as those studied by
Edelman (Doran, 2010).

Our
fourth data source is page view data from Pornhub.com, which was the
third largest online host of adult entertainment in the United
States at the time. We use the Pornhub data due its size as well as
the availability of data. Pornhub made the page views per capita
during the year 2013 publicly available and reported this data
separately by state. The Pornhub data is similar in nature to
Edelman’s data in that it is a provider-side objective measure of
pornography use. However, the data records page views instead of
subscribers; intuitively, the data would reveal patterns of heavy
per-person use as well as patterns of proliferation among the
population. The data also has the relative advantage of including
both paid and unpaid use.

Assessing the
representativeness of new data sources

The
big data revolution is beginning to dramatically open up the types
of data sources that can be used to measure and study behaviors,
such as pornography use. The subscription data used by Edelman
(2009) represents the type of large datasets that will increasingly
become available to scholars in their research. An important first
step in using this type of proprietary data will be assessing the
degree to which the data from a single provider is representative of
the general population of interest. In this section, we provide a
framework assessing the representativeness of a dataset by comparing
it to the patterns observed from another data that is known to be
nationally representative or by comparing it to a combination of
other data sources that collectively are likely to represent the
true underlying pattern of behavior.

In
Table 1 we list the top ten and bottom ten states for pornography
use based on each of the four sources: subscription data, Pornhub,
NFSS, and Google Trends. Mississippi is one state that ranks in the
top four states in pornography use across all four datasets and
Idaho consistently ranks near the lowest rates of any states across
most of the measures. In contrast, other states such as Arkansas and
Utah rank in the top ten along some measures but in the bottom ten
along other measures. These results suggest that identifying which
state seems to have the highest rates of pornography use based on a
single data source can be a bit problematic.

Table
1.
Rank Order of States Based on Four Different Data Sources
Controlledfor Broadband Internet Access.

Rank

NFSS

2012

Google
Trends

2013-2014

Paid
Subscription 2006-2008

Pornhub

2013

1
(highest)

Nevada

Mississippi

Utah

Kansas

2

Mississippi

Texas

Alaska

Nevada

3

Tennessee

Arkansas

Mississippi

Illinois

4

Kansas

Louisiana

Hawaii

Mississippi

5

Missouri

Tennessee

Washington
D.C.

Georgia

6

Wyoming

Oregon

Oklahoma

Texas

7

Washington
D.C.

Kentucky

Arkansas

Missouri

8

Oklahoma

Michigan

North
Dakota

Oklahoma

9

Illinois

Missouri

Louisiana

Colorado

10

Indiana

Georgia

Florida

Kentucky

42
(10th
lowest)

New
Hampshire

Maine

Michigan

South
Carolina

43

New
Jersey

New
Jersey

Wyoming

Vermont

44

Virginia

Connecticut

Connecticut

Arkansas

45

New
York

Maryland

Delaware

South
Dakota

46

Idaho

Utah

New
Jersey

West
Virginia

47

New
Mexico

Washington
D.C.

Oregon

Wyoming

48

Colorado

Vermont

Ohio

Montana

49

Vermont

Massachusetts

Tennessee

Maine

50

Utah

New
Hampshire

Idaho

Idaho

51
(lowest)

Delaware

Delaware

Montana

Utah

Notes:
Pornhub
data does not include Washington D.C. and hence ranks only to 50.
The NFSS sample excludes for which there were less than 50
respondents. For datasets without the full set of 51 states/DC, 51
refers to the lowest rank and 42 refers to the 10th
lowest rank.

In
Table 2 panel A we estimate the correlation between each of the data
sources using the actual measures of pornography use from each
source rather than the ordinal ranking which is reported in Table 1
from these measures. The paid subscription data has, by far, the
weakest correlation with the other three sources and is even
negatively correlated with the NFSS survey data. The paid
subscription data has a correlation of -0.0358 with the NFSS, 0.076
with Google Trends, and 0.0066 with Pornhub. None of these
correlations are statistically significant; corresponding
t-statistics are all less than 0.6 (which correspond to directional
p-values greater than .3). In contrast, the other three rankings
show relatively notable correlations. Google Trends and Pornhub have
a correlation of .487, NFSS and Google Trends have a correlation of
.655 and Pornhub and NFSS have a correlation of .551. All of these
correlations are statistically significant with a t-statistic
between Google Trends and Pornhub of 3.78, between NFSS and Google
Trends of 5.68, and between Pornhub and NFSS of 4.28. All of these
correspond to directional p-values of less than .0004.

In
panel B we report correlations using the ordinal rankings created
from each data source. Correlations between NFSS, Google trends, and
Pornhub have comparable correlation coefficients and significance to
those in panel A, likewise the correlation between Google trends and
paid subscription is similar. The panel is notable because when
using ordinal rankings paid subscription data better correlate with
Pornhub and NFSS survey data, however the correlations are still
insignificant. The two panels allow us to draw similar conclusions,
however the larger coefficients for paid subscription data are worth
noting despite the fact that they are insignificant and notably
weaker than the correlations of the other sources with each other.
We believe the correlations using the actual measures of pornography
use rather than ordinal rankings best represents the industry
because it accounts for the actual difference in pornography use
rather than just the specific ordering of the states.

Table
2.
Correlation between the Four Data Sources.

Paid
Subscription

NFSS

Google
Trends

A.
Continuous measures

NFSS

-0.0358

(0.25)

Google
Trends

0.0760

(0.52)

0.6547

(5.68)

Pornhub

0.0066

(0.05)

0.5510

(4.28)

0.4867

(3.78)

B.
Rank correlations

NFSS

.2670

(1.838)

Google
Trends

0.0821

(0.577)

0.6886

(6.299)

Pornhub

0.2424

(1.749)

0.5344

(4.194)

0.4490

(3.518)

Notes:
Correlation
coefficients between datasets of each metric of pornography use
controlled by broadband internet access. T-statistics are provided
in parenthesis.

The
significant correlation between the three non-paid subscription data
sources, despite the different variables they measure (search
volume, page views and proportion of pornography viewers), suggest
that they are measuring a real underlying pattern of variation in
pornography use across states; one that is not correlated with the
subscription data used by Edelman (2009).

Sensitivity of estimates to
data source used

In
order to illustrate the importance of accounting for the differences
in state pornography rates across different data sources, we
replicate the results of a recent study that found that more
religious and more conservative states were more likely to search
for sexual content on Google (MacInnis & Hodson, 2014). We
examine whether the conclusions of that paper apply to other
measures of pornography use using the other data sources that we
have described in this paper. The results of this replication are
given in Table 3. We standardized the pornography-use, religiosity,
and conservatism measures by subtracting the mean and dividing by
the standard deviation to allow for comparisons across the different
pornography use measures (this approach is equivalent to converting
each of the measures into a Z-score).

Table
3.
Correlations between State-Level Religiosity or Conservatism and
Each Metricof Pornography Use.

Religiosity

Conservatism

No
controls

Controls

No
controls

Controls

Google
Trends

0.610***

0.223

0.479***

0.266*

(.163)

(.176)

(.151)

(.146)

NFSS

0.213

-0.0782

0.215

0.0879

(.195)

(.310)

(.200)

(.304)

Pornhub

0.129

0.0930

-0.0732

-0.0265

(.153)

(.232)

(.163)

(.207)

Paid
Subscriptions

0.299

0.487

0.167

0.221

(.192)

(.314)

(.183)

(.254)

Notes:N
= 50. Controls include state population, state GDP, percentage of
individuals below the poverty line, and internet use both in-home
and out-of home. Data on conservatism and religiosity by state in
2013 is drawn from Gallup, GDP by state from the U.S. Census Bureau,
and number in poverty, internet use, and population from the Census.
Measures of pornography, conservatism, and religiosity are all
normalized to have a standard deviation of one. Robust standard
errors in parentheses. ***, **, and * indicate statistical
significance at the 1%, 5%, and 10% levels respectively.

In
the original study, MacInnis and Hodson (2014) gave results based on
Google Trends data separately for specific search terms such as sex,
porn, and XXX, similar to the terms that we are using in our Google
Trends measure. The results in the first row of Table 3 show that we
also find a statistically significant relationship between
religiosity and conservatism in most cases when we use the Google
Trends data. However, the other rows in Table 3 show that we get a
much weaker statistical relationship when using any of the other
three data sources. These results suggest that if MacInnis and
Hodson (2014) had used any of the other three data sources, they
probably would have come to a different conclusion in their paper
about the strength of the relationship they were examining.

The
fact that MacInnis and Hodson (2014) find a statistically
significant relationship between state-level religiosity and
state-level pornography use is interesting considering that past
studies using individual level data find that individuals who
regularly attend church are much less likely to use pornography
(Doran & Price, 2014; Patterson & Price, 2012; Stack,
Wasserman, & Kearns, 2004). This type of pattern in which
group-level relationships are opposite what is found at the
individual level has also been found in the relationship between
education and religion (Glaeser & Sacerdote, 2008) and the
relationship between income and political affiliation (Glaeser &
Sacerdote, 2007).

Discussion

Each
of the data sources considered above captures a different
cross-sectional view of the online pornography industry, and each
has important vulnerabilities for researchers interested in general
levels of pornography use by state. NFSS survey data, for example,
probably underreports pornography consumption because of social
desirability bias and subjects’ faulty memory. Google Trends data
fails to capture any pornography use that is accessed through means
other than a Google search. Pornhub and paid subscription data may
be limited in their representativeness; they measure use with
respect to only a single firm in the industry.

When
data from any source is used in research, results must be presented
in context of the data that lead to those results. Issues arise when
individuals mistakenly interpret a given data source as representing
the entirety of the pornography industry. There are many other
settings in which similarly non-representative data may be
erroneously over-generalized. Researchers and individuals must be
aware of the external validity of their findings while the media and
readers must be careful not to overgeneralize results.

We
also recognize a limitation of our data sources in that they capture
the pornography industry in different historical moments; Google
Trends (2013-2014), paid subscription (2006-2008), Pornhub (2013),
and NFSS (2012). Paid subscription data were collected approximately
6-7 years prior to the other sources. This time difference may bias
our results, however the general trends in the data sources as a
whole are such that we believe our findings to be accurate. Major
shifts in the relative use of pornography across states from
2006-2013 would be needed for this bias to occur which we believe is
unlikely.

When
attempting to rank order individuals regarding some form of
activity, multiple sources (if available) must be viewed for the
sake of contrasting results. Should the orderings be similar their
accuracy can be more readily assumed. Should they differ, an
opportunity arises to understand more regarding the issue. In our
particular case, the differences are likely to arise because the
sources capture different types of pornography use.

Past
research on pornography use has touched on the degree to which it
might affect important areas of interest such as divorce, happiness,
worker productivity and sexual violence (Bergen & Bogle, 2000;
Doran & Price, 2014; Patterson & Price, 2012; Young &
Case, 2004). When such research is being conducted data must be from
a reliable and generalizable source (or sources). Results and
findings of any such effects must be considered in light of the age,
gender, and sexual identity of individuals as well – factors which
are not considered in this paper (Sevcikova & Daneback, 2014;
Stoops, 2015; Traeen & Daneback, 2013; Tripodi et al. 2015). In
such research opportunities pornography use by state may play a role
in the analysis. Given the results of this paper the data source of
such a variable must be heavily considered in such a regression and
result must be interpreted in context of the data source.

Conclusion

Data
provided by specific companies have the potential to provide
important insights into public issues. A major challenge is
determining when the data of a single company, even a very large
one, can provide insights that are representative of the entire
population. Assuming relative rates of pornography across states did
not have major changes from 2006-2013, the results of our paper
suggest that in some cases the information from a single company may
make for a misleading picture of the geographic patterns of a
specific behavior. This can be particularly important for
pornography use since the vast majorities of individuals who access
pornography online only access free content rather than using a paid
site (Doran, 2008).

The
results of this paper draw on four different data sources about
pornography use including two that involve nationally representative
data (Google Trends and NFSS). We find a significant correlation
between three of our data sources suggesting that they all reflect a
similar underlying pattern in pornography use across states. In
contrast paid subscription data, the one source that has received a
fair amount of media attention, actually correlates rather poorly
with the other sources. We also show that choices across data
sources can affect the conclusions that studies draw and suggest
that future studies include sensitivity tests across data sources
when examining issues for which it is challenging to get an ideal
measure of the specific behavior.

MacInnis,
C., & Hodson, G. (2014). Do American states with more religious
or conservative populations search more for sexual content on
Google? Archives
of Sexual Behavior, 44,
137-147. http://dx.doi.org/10.1007/s10508-014-0361-8