Understanding Subjectivity in Data Science

Share This On

Jul 10, 2018 | 2784 Views

In a perfect world, data scientists would take subjectivity out of their conclusions when examining findings.

However, bias creeps in before people even realize what's happening. Being aware of possible subjectivity allows researchers to work to minimize it and not publish their work before being relatively confident it's as objective as possible.

Humans Are Fallible

One common line of thinking is that the mere usage of big data tools in research removes bias. But, although those advancements bring numerous streams of information together for analysis, humans are still involved in gathering the data at various stages in the processâ??-â??and humans are naturally subjective.

A September 2016 study hoped to use big data to build a comprehensive collection of events that affected civilizations around the globe. It found systems based in the United States and Latin America rarely duplicated the same circumstances by registering them as significant.

That's because even the most robust, intelligent big data tools have humans behind them, especially those who train the data science platforms. Since people aren't impartial beings, even though data science can remove some subjectivity, humans possess it innately.

More Data Makes It Easier to Find Supporting Material

People show confirmation bias, which is the tendency to look for things that support a viewpoint while ignoring conflicting information.

Advancements in data science potentially worsen that characteristic by giving individuals access to vast amounts of data, thereby increasing the likelihood they'll find material that supports their opinions if they only look hard enough.

If that happens, they might find information that pushes a study in one direction, while ignoring contrary findings that lead to an entirely different conclusion.

Subjective Data Is Variable

People get subjective data by communicating with others. They can also collect it by making assumptions and judgments based on communications. So, the characteristics of the data can vary from person to person or even change based on how one individual feels at a given moment in time.

In contrast, objective data comes from verifiable facts or events, and has consistency even across multiple sources.

Whereas subjective data includes an element of personal interpretation, objective data primarily depends on using accurate details to confirm what happened or what someone assumes.

Subjectivity isn't always negative. For example, someone could subjectively alter mapped data that shows sales territories or a similar kind of information to support a company's needs and make statistics more accessible to stakeholders.

However, people using mapped data to explain things must take care to thoroughly explain variables that might otherwise make individuals reach incorrect conclusions.

They could use explanatory notes on the map itself to clarify the sample size and the techniques used to gather the information. Those additional insights give people the knowledge they need to reach informed conclusions.

Subjectivity May Ignore Crucial Realities

As mentioned earlier, humans tend to focus on material that supports their beliefs, and ignore stuff that doesn't. Sometimes, that problem means data scientists don't notice the hidden findings in their research.

Poorer, underserved communities are particularly at risk of getting victimized by that problem. For example, a smartphone app called Street Bump let Boston residents report potholes when they came across them. The idea was that the collective information from the app would let local infrastructures know where the worst issues existed, and that concept makes sense.

However, what about the people from low-income communities who might live in pothole-filled places, but can't afford the smartphone the app requires to facilitate the reporting process?

Is Big Data Worsening Racial Bias?

Police departments increasingly rely on big data to make predictions about future crime and figure out how to best allocate their officers across multiple neighborhoods.

However, analysts point out racial bias is rampant in law enforcement. Research shows police are less likely to stop white individuals than African-American and Latin-American individuals, even when the two groups demonstrate the same behaviors.

Instead of merely trusting big data findings are accurate, people in law enforcement must evaluate all the characteristics of the input data and determine whether aspects about it may be teaching algorithms to emphasize and perpetuate racial bias.

Such a thing could happen even without a big data platform. For example, an officer in a particular jurisdiction could state, "My experience says I arrest more African-Americans than Caucasian individuals for violent crimes," and a colleague might say the same.

However, a closer look at the statistics could show African-Americans make up a small portion of those illegal activities, but the officers were too reliant on personal experience to make their conclusions objective.

Working to Reduce Biased Big Data

It's not possible to remove subjectivity from the information data scientists work with every day.

However, a responsible data scientist must realize such biases exist and consciously attempt to minimize them, whether by examining a less obvious aspect of the data or answering the all-important question, "Is there something I'm overlooking?"