My previous pieces in this series have looked at things to look for when reading research pieces as well as a beginners’ glossary of research terms.

This time, I want to demonstrate how careless representation of data can lead to extremely misleading claims, even if well intentioned.

The example I am using is an older infographic created by RAINN (Rape, Abuse and Incest National Network). [1]

I should make it clear that whilst RAINN have since provided an updated version, that version still operates on the same principles as this one:

From their website, they list reports from both the FBI and the Department of Justice as sources for their data. [1] This gives us a good opportunity to track down the data used.

Having understood the claim and learned where the sources come from, we should look at their methodology.

32 are reported to the police

The data for this claim was taken from the National Crime Victimization Survey, 2008 – 2012. [2] The infographic tells us that 68 out of 100 rapes (or 68 per cent) weren’t reported to law enforcement over this time.

The method they used to determine this estimate was to first use a survey of 90,000 households on self-reports of crime victimization. [13] Once they collect this data, they use a series of complex calculations to provide estimates for the larger population. Finally, they cross reference this estimate with the data collected by police forces as to the total number of reports lodged with them on each type of crime. The difference in the numbers between the survey estimate and the police records is then considered “estimated unreported incidences” of a crime. You can see a full breakdown of their methodology below. [3]

So far, so good.

However, there are several caveats to consider:

i) It’s worth bearing in mind that the total estimate RAINN uses in their initial calculation is NOT a total estimate for “rape”, but a total estimate for “rape/sexual assault”. [4] It is possible that since many states use various definitions for “rape/sexual assault” that they simply wished to create a broader term to collate the data in. However, this is purely speculation on my part. The reality is that rape is not the same as sexual assault and conflating the totals only muddies the waters.

ii) It would seem RAINN is aware of this and has ignored this distinction, based on their own phrasing. You can see in the text above they refer to “sexual assault”, then refer to “rape” in the infographic, despite the fact the data source they used does not make this distinction. [1][4]

iii) A final note of caution here – you cannot conflate “report” as “confirmed incident”, which RAINN have done in their total estimates. We have no way of sorting how many cases had little/no evidence, how many allegations were withdrawn or how many may have been purposefully false (for example).

It’s worth noting that for all of their claims, they compare the survey estimate for “rape/sexual assault” combined with the total estimates for arrests, referrals to prosecutors, convictions and those who spend a single day in prison.

As for the Department of Justice’s definition of “sexual assault”, according to their website it is:
“…any type of sexual contact or behavior that occurs without the explicit consent of the recipient. Falling under the definition of sexual assault are sexual activities as forced sexual intercourse, forcible sodomy, child molestation, incest, fondling, and attempted rape.” [5]

Whilst all of those things should be treated with the utmost seriousness, conflating a broad spectrum of sexual offences with rape is simply not helpful in any way.

7 Lead to an arrest

The second source used by RAINN is the FBI’s Uniform Crime Reports, focusing on arrest data over the duration of 2006-2010. I have included a link to the summary of their 2010 data below. [6]

Their claim that 7 out of 100 incidences lead to an arrest comes from collecting the total arrest rates for rape over a 5 year period from the FBI databases and comparing that to the data provided by the Justice Department on the total estimated rape/sexual assault numbers.

To use an example based on just one years’ data, in 2010 there were estimated to be 188,380 “rape/sexual assault” incidences according to the Justice Department. [7] According to the FBI, in 2010 there were estimated 20,088 arrests for “forcible rape”. [6] Therefore:

20,088/188,380=0.106636x100=10.6636

Therefore, by this calculation, we can assert that the total estimated arrests for “forcible rape” (by the FBI’s figures) made up close to 11% of arrests for all estimated incidences of rape/sexual assault (by the Justice Department’s figures) for 2010.

There is a serious issue here. The FBI and the Justice Department both record rape and sexual assaults differently. Whereas the Justice Department data conflates “rape/sexual assault” in the same category, the FBI has separate categories for “forcible rape” and “sexual assault (excluding forcible rape and prostitution)”. Yet in RAINN’s analysis, they looked specifically at arrest rates relating to “forcible rape” whilst ignoring “sexual assault” arrest rates in the process.

The only reason for doing this is to inflate the number of “rapes” in one category, but minimize the number of arrests in another. In this way, RAINN has managed to present a misleading narrative which does not draw an accurate comparison on data rates.

There is another point to consider here, out of interest. The FBI have their own estimates on rapes that may have been reported to the law enforcement, which comes in at 84,767 incidences in 2010 alone. [8] Had RAINN compared the total estimated rapes reported to law enforcement with the total estimated arrest rates using the same database, the calculation would have looked like this:

20,088/84,767=0.236979x100=23.6979%

We can see when comparing these two data sources that almost 1 in 4 estimated reports of rape to law enforcement led to an arrest in 2010. This is far from definitive and ignores estimates on not-reported crimes. However, it also paints a very different picture and could even be used to suggest that law enforcement is more pro-active than is suggested by RAINN’s infographic.

One final point to consider here is that there is a discrepancy with the numbers on “rape/sexual assault” available from the Justice Department. For reference, this chart details the “rape/sexual assault” estimates from 2010-2011, as cited by [9]:

Whereas this one details the 2009-2010 estimates, as cited by [10]:

Note how the estimate for 2010 is different on both charts by a value of 80,190 estimated incidences. At this time, I am unable to confirm how this discrepancy has occurred and whether or not RAINN factored these differences in with their infographic.

If anyone is curious, I have included a link to another piece, “Victimizations Not Reported to the Police, 2006-2010”, which offers some explanations as to why crimes were not reported. [11]

3 were referred to prosecutors

This data is relatively simple to work out. RAINN cite “Uniform Crime Reports, Offenses Cleared Data” from the FBI over a 5 year period. I have provided a link to the 2010 data. [12]

A “cleared offense” can happen one of several ways, but typically refers to when a case is “closed” and usually turned over for prosecution. RAINN appear to have taken the overall “clearance rate” for each year, then compared that to the FBI’s arrest rates (for example, in 2010, 40.3% of arrests for forcible rape ended in being “cleared”) and worked out a broad average.

The clearance rates for 2006-2010 according to the FBI stood at:

40.9, 40.0, 40.4, 41.2 & 40.3 per cent

RAINN have then taken 40% of total estimated arrests in the infographic (in this case, 7) and then referred to them as “cases referred to prosecution” (a total of 3), which is a simple but mostly reasonable assumption to have made.

Conclusion

In order to save on the word count, I will say that the remainder of the infographic, surprisingly, is loosely accurate, though the relevance & use of claiming "the other 98 will walk free" is up for debate.

Unfortunately, RAINN have chosen to conflate “rape/sexual assault” estimates with “forcible rape” data specifically from two different data sources. This is then followed by purposefully conflating “reported & unreported estimates” as meaning “confirmed incidence of rape”, which the data does not tell us.

To top this off, these looser estimates on possible criminal incidences are then compared to arrest rates based on a much stricter definition, further increasing the perceived disparity between criminal activity and actions taken by the justice system.

This is precisely the reason why we must take time to understand what data is used to make a claim and HOW the claim is made, not simply repeat a claim uncritically. Misleading claims, even if well intentioned, can have disastrous consequences and impractical real world results.

As stated
in my previous piece, Egalitarian Feminism want to help people learn how to
read research papers, what to look for and from there how to interpret the
data.

Continuing
on with that theme, I have compiled a basic glossary of some frequently used terms
and their definitions to help people understand some of the language used. This
list is not exhaustive by any means, but hopefully people may find it useful.

Aggregate:

Definition:
A total created from multiple smaller units. To take an example, the population
of a country is the aggregate of all the cities, towns, villages, rural areas
and so forth.

Attrition:

Definition:
The rates at which participants drop out of a study over an extended period of
time. A study with a high attrition rate risks creating significant bias in the
results and potentially threatening the overall quality of the research.

Bias:

Definition:
Bias is a form of influence in a set of data which ends up producing lopsided
or misleading results. As a result of different biases, a set of data might be
over-representative or under-representative of the larger population. Bias can
come in many forms, such as positive & negative response rate bias in
samples, common method bias (see below), instruction bias (where instructions
as to what is wanted are unclear, researchers use their judgement to dictate
what they want. However people can respond to this differently depending on
their perception of the instructions) and more besides.

Chi square:

Definition:
A statistical test used to compare expected data with data that has been
collected, usually represented with an x2. A large difference
between expected and collected results indicates that something may have caused
the discrepancy. A suitably large difference allows researchers to reject the null
hypothesis (see further down).

Coefficient:

Definition:
The number or known factor (usually a constant) by which another value (usually
a variable) is multiplied. As an example, imagine you have a sample of workers
that is 10% of the total population of workers in an area. Having collected
your results from the sample (let’s say how many employees work in sales), you
wish to estimate how the larger population is likely to look if your data is
accurate. You would therefore multiply your variable (sales employees) by your
coefficient (10).

Common Method Variance/Bias:

Definition:
A term given to concerns raised by how the data is interpreted and supplied from
surveys. For example, a survey in which respondents use their own
interpretation of terms might receive very different response rates depending
on who is responding and how they interpret the questions being asked.

Confidence
Interval:

Definition:
A term used to express the researchers’ level of uncertainty in their
estimates. A researcher who claims that their confidence level stands at 60% is
telling you that if you were to take the same sampling method but choose
different samples, you would expect the true population parameters to fall
within that estimate 60% of the time. The smaller the confidence interval, the
greater the uncertainty in the accuracy of the results.

Control
group:

Definition:
In an experiment, the control group has data collected, but the findings from
their data is not included in the results. Its’ purpose is to show what would
normally happen in a given situation and to compare that data with what happens
when you alter an independent variable. This allows the researchers to
determine if altering a variable is having an effect on a test group, as well
as demonstrating what effect it has.

Dependent
variable:

Definition:
A variable that can be influenced by another variable which researchers can
change. For example, consider two variables “employment” and “age”. Here,
“employment” is the dependent variable. It can be affected by the variable
“age”.

Double
blind experiment:

Definition:
An experiment where both the researcher(s) and the participants are unaware of
which the control group is and which is the treatment group. This is often done
in psychology studies to further reduce potential bias created by the
participants and the researchers.

Hypothesis:

Definition:
A testable theory. For example, “My hypothesis is that if I water my plants
regularly and give them lots of sunshine, they will grow healthily”.

Independent
Variable:

Definition:
The variable in an experiment that is manipulated by the researchers. It also
refers to a variable that is not affected by, but does affect, a dependent
variable. In the example given under “dependent variable”, “age” would be an
independent variable. “Employment” does not affect “age”, but “age” may have an
impact on “employment”.

Meta-Analysis:

Definition:
A term used to describe the method of combining and analysing data from
multiple studies on the same subject.

Null
hypothesis:
Definition: This term represents the assumption that the variables of an
experiment may have no effect on the results. In the example given for
“hypothesis”, the null hypothesis would be that “regular water and lots of
sunshine will not help plants grow more healthily”.

P-Value:

Definition:
P-Value refers to the idea that the results from a study may have been down to
chance, usually represented by a lower case p (for probability). For
example, if you see in a study p. < 0.05, this tells you that
there is an equal to or less than 1 in 20 chance that the results were down to
luck. Most researchers assume that a p value greater than 0.05 means the
results were not statistically significant or are too prone to chance to be
considered viable.

Parameter:

Definition:
“Parameter” refers to a summary – usually a percentage or average – that
describes the entire population.

Population:

Definition:
“Population” refers to any large group of objects or individuals about which
information is desired, such as Germans, flowers or insects.

Random
Sampling:

Definition:
A sampling technique where individuals from a population are picked at random.

Regression
Analysis:

Definition:
A method of statistical analysis used to examine relationships between
variables. Perhaps the best way to think of this is to think of a scatter chart
(a chart where the data points are marked with little dots). The regression
analysis is represented by the central line drawn through the data to mark out
the average.

Sampling
error:

Definition:
A term used to describe the level in which results from a sample are different
from results that is expected to be obtained from the larger population.

Statistically
significant:

Definition:
A term used to explain that a difference in results did not occur by chance.

Variable:

Definition:
A characteristic or trait that varies between any group of objects or people.
Race, gender, age and education are all examples of variables.

Weighted
sample:

Definition:
A correcting technique used to adjust responses given by survey respondents to
match the larger population. Typically this is done when certain demographics
(based on race, age, etc) are over or under-represented in a survey and the
researchers wish to have their results reflect the larger population.

A word of caution: Correlation Vs Causation

A trap many
people (and even some researchers) can fall into is to confuse correlation in
data with causation. Although a trend may exist, it does not mean that one
causes the other.

Correlation
can be understood as recognising when two or more variables show a tendency to fluctuate
together, but a change in one does not necessarily cause a change in another.

Causation
can be understood as recognising when two or more variables show a tendency to
fluctuate together and a change in one WILL cause a change in another.

As an
example, we could look at profits from ice cream sales and warm weather. As the
weather becomes warmer, we would expect to see companies sell more ice creams,
thus make more profit. The correlation in this example is that ice cream sales
increase with rising temperatures. The causation is that selling more ice cream
leads to a rise in profits.

Ending Notes

Hopefully
you have found the above piece useful and leaves you feeling more confident in
being able to read research papers for yourself.

In the near
future, I will begin tackling claims made by different people as well as
research pieces of interest. If there are any pieces you wish examined in more
detail, please leave a comment below or contact me on twitter (@DrewRoanEgaFem).

Friday, 8 April 2016

At Egalitarian
Feminism, we strongly believe in using evidence based claims and using good
quality research to back our assertions. We believe poor quality research can have
the unfortunate effect of increasing peoples’ fears and creating hostility
where it is not warranted. In striving for equality for all regardless of
gender, we must make sure that any research we use is as fair and reasonable as
possible. We are also strongly aware that sometimes, research can be misrepresented
in the media and how this can affect what assumptions people make.

Therefore,
Egalitarian Feminism will be starting an “EgaFem Analysis” series where
research is scrutinized in detail and any findings are posted here for anyone
to read.

But we want
to do more than just analyse data. We want to help people learn what to look
for, how to understand academic pieces and facilitate themselves in being able
to interpret research and come to their own conclusions, not to solely rely on
the reports of others.

Here are some points I have found to be highly useful when examining research papers.
Hopefully, you will too:

Do you know where the claim has come from?

Statistics
and claims are often thrown around during discussions online and in the media.
Ask yourself “do you know where this
statistic has come from”? Have you seen any source material for it? If you
have no idea where the claim has come from, can you honestly say you understand
the claim and can trust it?

As a rule
of thumb, if a claim is made but no evidence can be or has been provided for
it, it’s best to assume that it may not be accurate or fully trustworthy.

Don’t rely on the summary. Skim the introduction. Study the method and
the results.

A trap many
people fall into is to read the summary and assume it accurately reflects the
research material in the study to such a level they often feel safe using it as
a source. This is a bad idea. The summary at most only gives you a snap shot of
what the researchers want you to take away from their study. It tells you nothing
about their methods, results, conclusions, errors or disclaimers that might
complicate the picture.

Introductions
are usually filler text to explain the background and the necessity for the
study in the first place. That said, it is always worth skimming through an
introduction as it can yield a lot of information about the researchers’ point
of view before conducting the study. Researchers who make bold or extraordinary
claims, especially if those claims lack citations, may be more inclined to
offer up incomplete or misleading research.

It cannot
be stressed enough that reading the method and results are essential to
understanding any research piece.

Always check the sample size and neutrality.

Sample size
and bias can massively alter the quality of the results. A study with a small
sample size might produce disproportionately large or small results that do not
accurately reflect the wider population. Participants who have volunteered
often offer a higher response bias than participants who are randomly sampled.

On rare
occasions, participants in a sample may have been “coached” to elicit certain
responses before participating in the research. Again, this can massively
distort the results and potentially make the research extremely unreliable.

Who funded and conducted the research?

A commonly
overlooked point to consider is the question of where the money for the
research came from. It is not unheard of for researchers to be funded by
specialist or advocacy groups, in order to conduct a study on a particular
topic. Whilst it should be expected that researchers will always remain
impartial, this is not always the case. You might find a study on the “harms of
sugar in the brain” being funded by diabetes research groups, or a study on the
benefits of alcohol being privately funded by alcohol manufacturers. This does
not necessarily invalidate a piece of research, but it may warrant taking the
findings with a pinch of salt.

Of equal
interest is the question “who conducted the research"? Was it
conducted by a survey group paid to find data on a particular topic, or from
professors with a known history of possessing a particular ideological bias?
All people are capable of being influenced by their own biases and this can be
reflected in their research. Once again, it does not necessarily invalidate a
piece of research, but it may give a reason to be cautious when repeating the
findings.

Are you sure you understand their definitions?

Another
common trap to fall into is to assume the definitions the researchers are using
match up to legal or common use definitions. Equally, some terms that are
frequently used in regular conversations may have a different meaning when
applied in a certain context. Some examples might be:

Confusing “wage” (pay per hour) with
“salary” (pay over the course of a year).

Mixing up “sexual offences” with
“sexual assault”.

A researcher who may be using a
non-legal definition of “discrimination” to include examples that may not match
the legal definition.

Always make
sure that you are clear on exactly what the researchers mean when using
particular phrases.

If the research includes a survey, are those questions clear cut and
straight forward to respond to?

Researchers
have known for a long time that asking a question directly often does not yield
particularly fruitful responses. Instead, questions with more ambiguous wording
often yield a higher response rate, though these sometimes come with a risk of
inaccurate reporting and artificially inflated response rates.

For
example, consider the following two questions:

Have you ever been raped whilst
drunk?

Have you ever had sexual intercourse
when you did not want to whilst under the influence of alcohol?

On the
surface, these two may appear to be the same in nature. In reality, the second
version may yield a greater response yet, but may also include occasions that
were not rape (such as drunken one-night stands, incidences of cheating and so
forth).

Have the researchers used all the available data in their conclusions?

Always
check to see if all relevant and available data has been used in the
conclusions of the research. If data has been left out, why? Would it have
affected the results? Was that data potentially relevant to the overall piece?

Whilst it
may not necessarily invalidate the research in question, if researchers have
chosen to leave out certain results or data sources, it may lead to others
misinterpreting the information or claims being made which lack context. It may
even lead the researchers to drawing a potentially unreasonable conclusion.

A little critical thinking is a great thing.

One of the
best things you can do with any piece of research is to critically think about the
findings, method and conclusions. Be mindful of potential flaws in the study.
What would you have done differently? Would you have come to the same
conclusions? Have you double checked their mathematics, to see if the numbers
add up properly? Not all criticisms will be reasonable, but you might be
surprised what issues with a research piece you can find if you examine it in
more detail.

I hope you
have found the above points useful in some way and that in turn, you feel more
confident in reading research papers for yourself. My next piece will be a
basic glossary of technical terms you may frequently encounter to help further
deepen your understanding.

If you come
across any piece of research or a claim you would like examined, feel free to
leave a comment and let us know.

Egalitarian Feminism didn't write this petition, but we support it. This petition needs to reach 100,000 UK signatures before September 2016. The sooner the better to send a strong message to the UK Government that this is a serious issue.

If you're not part of the UK, please spread the word and advertise it. The more people who spread it the better. Once the UK change their law, it will put pressure on other countries to follow suit.

Share it on Twitter, Facebook, Tumblr, and every other social media outlet. But don't stop there! Tell your friends and family. Get the message out in as many ways as possible. And spam this to any YouTube personalities, politicians, feminists, twitters accounts you think would be interested. Check out this page for ideas of who to send it to: http://egafeminist.blogspot.co.uk/2016/01/egafems-campaign-stakeholders.html

Here is a useful image that summarizes the petition. Download and share it - but if you make any changes please ensure the link to the petition is included.

If you have twitter, please retweet this:

Equality means women must be taken equally seriously to men when they're nice AND naughty.

Here is a useful summary on why this is a feminist issue. Please download it and use it to support the campaign and send it to all the feminists you know.

If the feminist believes in true equality, at all levels and not only when it is convenient to women they should support this petition and help spread the word.

It doesn't matter if they are Opportunity or Outcome Feminists, as this issue impacts both. Women should have equal outcome as their actions should be equally recognised to men. Forcing someone to have sex is rape. Women should have equal outcome and opportunity under law to men when they choose to commit the same crime.

However, if they believe in female supremacy, that women are incapable of evil or are a 'special case' then they don't believe in equality, and can't be feminists. This isn't a 'true scotsman' logical fallacy, but the very definition of feminism is to believe women are equal to men, especially through their actions and should be treated equally by society and law. If they don't believe women should be equal, they are not feminists.

Only a misogynist would deny women their impact and agency. Only someone who doesn't believe women are capable would claim they cannot chose to commit wrongdoings based on their gender. If women's wrong doings are not recognised, if they shouldn't be held equal to men, then why should the good women do be equally recognised? If women are a 'special case' that should be treated carefully, as if women are poor fragile pieces of glass then why should women be paid equally? Why should women be given equality when they are good girls, if they have less responsiblity, impact and agency to men?

Either you believe women should be equal, or you don't. You don't get to pick to and choose equality only when it benefits women.