Introduction:
Around 7/7/05 while checking some of my daily news websites I saw a few articles
which mentioned some interesting statistics as they pertained on online behavior. Being
the maintainer of the Errata > Statistics subsection of Attrition.org this piqued my
interest; furthermore, they all seemed to be referencing the same
report. I decided that since I
found multiple articles with the same reference I might want to take a deeper look at what
the referenced report said. This page is an analysis of the Pew Internet and American Life
Project's report on Spyware which is based off of a survey it sponsored and was
conducted by Princeton Survey Research Associates International.

Structure:
Since there are two documents involved I'll look at each separately, although because the
Pew report is based off of the Spyware survey some remarks may be apply to both.
"Report" refers to the Pew document and "Data" or "Data Report" refers to the Princeton
Survey document.

Full Disclosure:
During my time as an undergraduate student at the University of Mary Washington I took
classes ranging from introductory to advanced statistics as a requirement for completion
of the psychology major. All told I took three semesters of courses dealing explicitly with
statistics and five semesters dealing with the application of statistics to psychology
research. I authored two questionnaires dealing with online behavior that were edited and
approved by faculty and an IRB review board for experimentation on human subjects. I
presented results of this research seven times at various undergraduate research
symposiums. I've also worked part time as a systems-administrator and help desk
technician for six years, and a network security administrator for two. Recently I began
full time work in this field after my graduation from university. It is with these
qualifications in mind that I offer my analysis. Comments, questions and criticisms can
be mailed to Zodiac@attrition , flames piped to /dev/null.

Notes:
The reference numbers refer to the pages as listed in the .PDF, not the printed pages. I felt
since most people would be reading a digital copy this would be a reasonable concession.

Pew Internet and American Life Project's report on Spyware

Overview:
The reason the statistics section in errata was created was a response to media reports of
statistics of computer security or cyber crime that did not reference a credible source or
any source at all. The Pew report does reference its conclusions, provides its source,
questions, descriptive statistics about the data and some verbatim user responses to
certain questions. While there are some parts of the report I take issue with, it is
ultimately well documented and ethically sound.

Problems:
(p. 3) ... "Although most do not know the source of their woes, tens of millions have
experienced computer problems in the past year that are consistent with problems caused
by spyware or viruses."

These problems are consistent but not exclusive. For example: (p. 3) "52% of
home internet users say their computer has slowed down or is not running as fast as it
used to.” (These results reference Q.26 in the data report). These statistics are a bit
misleading because these problems can be attributed to a variety of sources, for example
if I download a resource intensive program that runs in the background and continue
computing as normal, my experience will seem slower than it has before. If I were
presented with this problem as a sys-admin the first thing I would look at would be the
spyware status of the affected computer and generally speaking that would be the cause,
however, it wouldn't always be the cause. So how much of this 52% can be
attributed to spyware? Earlier in the section (p. 2) the author references the Online Safety
Study by AOL and the National Cyber Security Alliance which reported from scans for
spyware and adware on users computers results saying 80% of computers were affected.
Combining results from the two studies we find that 80% of 52% is approximately 42%,
so I would contend that of the 52% of users that reported problems in the Pew report, not
all of those problems are a result of spyware. Then another problem emerges, are the
definitions of spyware and adware consistent across the two reports, since the AOL study
was done in October or 2004 has their 80% statistic changed as technology has
improved? The answer just isn't clear.

(p. 8) ... "Either way, adware is used to serve up targeted advertising based on the user's
online behavior, much like a personal assistant who accompanies you in your online
travles, making suggestions about what you might like or where you might find a bargain
elsewhere."

This is a fairly tame metaphor for describing adware. When dealing with spyware and
adware infestations in a corporate environment, all programs, cookies, and registry
entries must go. The simple reason being that any program classified as spyware is
insidious in its very nature and adware is still enough of a privacy risk that it has no place
in a corporate environment. It is arguable that end users may not need this level of
privacy, if the sole purpose of a home computer is to facilitate online purchases wouldn't
a program fitting the report's description of adware be welcome? Yes and no, the report
goes on to point out that many users unknowingly install spyware or adware programs or
will even allow the installation without being fully cognizant of the implications. I offer
that most people are at least concerned about their privacy and current methods provided
by adware companies to alert users about the privacy risks are insufficient to address that
concern. Even if better methods are developed, it is an incredible amount of trust to place
in a product. Given all of this I would argue that adware currently has no place on in a
user's computing environment given the risks.

(p. 14) ... "88% of users say they have a good idea of what 'spam' means."

I can say I have a good idea of what quantum theory is all about but that doesn't make it
true. The report references a wikipedia definition (see points of interest for more on this)
for what spam is, but cross referencing this statistic with the data report indicates this
definition was not used when the participants were asked the question. They were asked
to rate their level of understanding on a fairly unspecific scale but their idea of what
constituted spam was not compared to any commonly held definition at all. This isn't
misrepresentation of data because the report doesn't state that these users do in fact have a
good idea of what those terms mean, but states they only believe they do. I think it would
be more appropriate to put this data into more of a definite context by explaining its
limitations.

(p. 14) ... "78% of internet users say they have a good idea of what 'spyware' means."

I am unclear on what order the questions were asked in, the report gives them in an order
that is out of sequence with the question numbers. If they were read in the order
given in the report then the validity of questions 51 and 35 are called into question. These
questions rely on a common understanding of what spyware is to be valid, so the
confidence rating of question 51 becomes suspect because the confidence may increase or
decrease given the user’s definition of spyware. Question 35 has similar problems
because given the definition of spyware participants may have a clearer idea of what the
source of their problem was. If anyone can provide clarification on this point, shoot me
an e-mail. I checked the methodology section of the data report and didn't find anything.

(p. 21) ... "Half of computer fixes are quick and easy, but one in five problems is never
solved."

This is somewhat counter intuitive for me, most spyware problems I've dealt with are not
quick and easy fixes. Usually it involves deep scanning the computer with multiple
spyware and adware removal tools, a virus scanner and probably some operating system
patches. Given this, are users really fixing the problem or are they only delaying its
reappearance?

This raises a privacy issue that Google dealt with when they debuted GMail and wanted
to integrate search features that would require what some felt was a breach of privacy.
Methods have been discussed whereby user privacy could reasonably be maintained but
rested on the enormous technical and funding resources google possessed. I would argue
that most firms who provide targeted marketing, not having the aforementioned
resources, could not make a similar guarantee. Therefore I don't think it is a reasonable
possibility that firms could give viable personalized advertising without sacrificing an
unacceptable amount of user privacy. I think, then, that adware still represents a
significant security and privacy risk similar to that of spyware.

(p. 11, charts) ...

I think its interesting to note that users who experienced adware changed their behavior
more than users who experienced spyware. I would say this supports my argument that
adware has no place on in a user's computing environment.

(p. 14, footnotes) ...

While I was looking through the document I noticed that the report referenced the
wikipedia project for definitions. I'm a big fan of the wikipedia, so I thought this was
pretty cool.

(p. 6) ... Acknowledgements

Two executives from targeted advertising companies were consulted and are listed in the
acknowledgements. It is unclear on how much their involvement may have influenced the
report in favor of their companies and market, I found nothing explicit but as noted above
I felt the description of adware was a bit "softer" than perhaps it should have been. Also a
point of interest, Microsoft is currently preparing to buy Claria. Keep this in mind, this is
not an indictment of fraud, data manipulation or any other such thing. It is, however, a
point of interest and I leave the reader to draw their own conclusions based on the
information presented in this report.

Conclusions:
I found this report to be generally well documented and based clearly on data found by
their survey. While I took issue with some of the questions and methods of the survey
document and some conclusions drawn by the report, this is very competently done.
Having reviewed similar reports in my academic life the problems and questions raised
here are not uncommon.

Princeton Survey Research Associates International data report and questionnaire for
the Pew Internet Spyware Survey

Overview:
Upon review of the data report I found the questions to be appropriately worded and the
data provided to be believable as representitive of their sample.

Full Disclosure:
The statistical methods listed in methodology section are out of my range of experience. I
can not speak to their appropriateness or how accurately the data provided meshes with
their methodology. I leave it up to the statistically canny reader to draw their own
conclusions; and please share them with me.

Problems:
(p. 15, Q61a and Q61b) ... "How often is the virus protection on your main home
computer usually updated?"

Given my experience in the corporate world the reported numbers aren't representative.
However upon installation of most virus protection software there is a prompt for how
often the user would like to update the product. If the computer is not on or connected to
the internet at the time the updates are to take place, does the process re-initiate when the
computer is on and connected? I've certainly seen instances where it doesn't, but I'm
unsure of how prevalent that problem is. Further, if the product does not update
automatically I have found that most users simply forget to do it, which draws into the
question the results of Q61b. In perfect circumstances it is difficult to get an accurate
response to this question because the survey participant has a vested interest in
maintaining a positive self image, so they may not be totally honest when answering this
question.

Points of Interest:
(p.13) ... Results

I found it interesting that respondents rated typical adware behavior closely to some of
the more troubling aspects of spyware.

(p. 41) ... Collection methods

This is an interesting section on how telephone survey companies find valid telephone
numbers, not something I was familiar with but cool nonetheless.

Final thoughts:
In my experience I've found that there's very little difference between spyware and
adware. Most spyware that I remove from user computers comes bundled with ads and I
will often find evidence of password theft, key logging, and transmission of data back to
a central location. I think a worthy project would be to establish something similar to a
spam blacklist, but instead a spyware blacklist. If the programs report back to a central
source that's something that can be controlled by egress filtering. Sys-logging of blocked
connections to those address can greatly simplify the process of finding and eliminating
spyware in an enterprise environment. Maybe some day we'll see ASAB, the Attrition
Spyware Address Blacklist.