Archive for the ‘Bias’ Category

Modern technology has revealed an irrefutable, if unpopular, truth: many of the statues, reliefs, and sarcophagi created in the ancient Western world were in fact painted. Marble was a precious material for Greco-Roman artisans, but it was considered a canvas, not the finished product for sculpture. It was carefully selected and then often painted in gold, red, green, black, white, and brown, among other colors.

A number of fantastic museum shows throughout Europe and the US in recent years have addressed the issue of ancient polychromy. The Gods in Color exhibit travelled the world between 2003–15, after its initial display at the Glyptothek in Munich. (Many of the photos in this essay come from that exhibit, including the famed Caligula bust and the Alexander Sarcophagus.) Digital humanists and archaeologists have played a large part in making those shows possible. In particular, the archaeologist Vinzenz Brinkmann, whose research informed Gods in Color, has done important work, applying various technologies and ultraviolet light to antique statues in order to analyze the minute vestiges of paint on them and then recreate polychrome versions.

Acceptance of polychromy by the public is another matter. A friend peering up at early-20th-century polychrome terra cottas of mythological figures at the Philadelphia Museum of Art once remarked to me: “There is no way the Greeks were that gauche.” How did color become gauche? Where does this aesthetic disgust come from? To many, the pristine whiteness of marble statues is the expectation and thus the classical ideal. But the equation of white marble with beauty is not an inherent truth of the universe. Where this standard came from and how it continues to influence white supremacist ideas today are often ignored.

Most museums and art history textbooks contain a predominantly neon white display of skin tone when it comes to classical statues and sarcophagi. This has an impact on the way we view the antique world. The assemblage of neon whiteness serves to create a false idea of homogeneity — everyone was very white! — across the Mediterranean region. The Romans, in fact, did not define people as “white”; where, then, did this notion of race come from?

…

A great post and reminder that learning history (or current events) through a particular lens isn’t the same as the only view of history (or current events).

I originally wrote “an accurate view of history….” but that’s not true. At best we have one or more views and when called upon to act, make decisions upon those views. “Accuracy” is something that lies beyond our human grasp.

The reminder I would add to this post is that recognition of a lens, in this case, the absence of color in our learning of history, isn’t overcome by our naming it and perhaps nodding in agreement, yes, that was a short fall in our learning.

“Knowing” about the coloration of familiar art work doesn’t erase centuries of considering it without color. No amount of pretending will make it otherwise.

Humanists should learn about and promote the use of colorization so the youth of today learn different traditions than the ones we learned.

Louise Goldsberry, a Florida nurse, was washing dishes when she looked outside her window and saw a man pointing a gun at her face. Goldsberry screamed, dropped to the floor, and crawled to her bedroom to get her revolver. A standoff ensued with the gunman—who turned out to be an agent with the U.S. Marshals’ fugitive division.

Goldsberry, who had no connection to a suspect that police were looking for, eventually surrendered and was later released. Police claimed that they raided her apartment because they had a “tip” about the apartment complex. But, according to Slate, the reason the “tip” was so broad was because the police had obtained only the approximate location of the suspect’s phone—using a “Stingray” phone tracker, a little-understood surveillance device that has quietly spread from the world of national security into that of domestic law enforcement.

Goldsberry’s story illustrates a potential harm of Stingrays not often considered: increased police contact for people who get caught in the wide dragnets of these interceptions. To get a sense of the scope of this surveillance, CityLab mapped police data from three major cities across the U.S., and found that this burden is not shared equally.

…

How not equally?

Baltimore, Maryland.

The map at Joseph’s post is interactive, along with maps for Tallahassee, Florida and Milwaukee, Minnesota.

I oppose government surveillance overall but am curious, is Stingray usage a concern of technology/privacy advocates or is there a broader base for opposing it?

Were you shocked at the disruption in Baltimore? What is more shocking is daily life in Baltimore, a city of 622,000 which is 63 percent African American. Here are ten numbers that tell some of the story.

Two. Over $5.7 million has been paid out by Baltimore since 2011 in over 100 police brutality lawsuits. Victims of severe police brutality were mostly people of color and included a pregnant woman, a 65 year old church deacon, children, and an 87 year old grandmother.

Google was built to help people find useful information by surfacing the great content that publishers and sites create. This access to high quality information is what drives people to use the web and for contributors to continue to engage and invest in it.

However, with thousands of new articles published online every minute of every day, the amount of content confronting people online can be overwhelming. And unfortunately, not all of it is factual or true, making it hard for people to distinguish fact from fiction. That’s why last October, along with our partners at Jigsaw, we announced that in a few countries we would start enabling publishers to show a “Fact Check” tag in Google News for news stories. This label identifies articles that include information fact checked by news publishers and fact-checking organizations.

After assessing feedback from both users and publishers, we’re making the Fact Check label in Google News available everywhere, and expanding it into Search globally in all languages. For the first time, when you conduct a search on Google that returns an authoritative result containing fact checks for one or more public claims, you will see that information clearly on the search results page. The snippet will display information on the claim, who made the claim, and the fact check of that particular claim.
…

What I find troubling about “fact checking” by Google is that some points of view, such as that of the NYT, are going to be privileged as “facts,” whereas other points of view will not enjoy such a privilege.

I have no doubt that a fact checking Google at the time would have said it’s a fact that Saddam Hussein possessed weapons of mass destruction, at least until years after that had been proven to be false. Everybody who was anybody said it was a fact. Must be true.

Disclosure: I have contempt for news reports that hype acts of terrorism. Even more so when little more than criminal acts by Muslims are bemoaned as existential threats to Western society. Just so you know I’m not in a position to offer a balanced view of Ronald Bailey’s post.

“It’s gotten to a point where it’s not even being reported. In many cases, the very, very dishonest press doesn’t want to report it,” asserted President Donald Trump a month ago. He was referring to a purported media reticence to report on terror attacks in Europe. “They have their reasons, and you understand that,” he added. The implication, I think, is that the politically correct press is concealing terrorists’ backgrounds.

To bolster the president’s claims, the White House then released a list of 78 terror attacks from around the globe that Trump’s minions think were underreported. All of the attackers on the list were Muslim—and all of the attacks had been reported by multiple news outlets.

For those five years, the researchers found, Muslims carried out only 11 out of the 89 attacks, yet those attacks received 44 percent of the media coverage. (Meanwhile, 18 attacks actually targeted Muslims in America. The Boston marathon bombing generated 474 news reports, amounting to 20 percent of the media terrorism coverage during the period analyzed. Overall, the authors report, “The average attack with a Muslim perpetrator is covered in 90.8 articles. Attacks with a Muslim, foreign-born perpetrator are covered in 192.8 articles on average. Compare this with other attacks, which received an average of 18.1 articles.”
…

While the authors rightly question the equality of terrorist reporting, which falsely creates a link between Muslims and terrorism in the United States, I question the appropriateness of a media focus on terrorism at all.

Aside from the obvious lure that fear sells and fear of Muslims sells very well in the United States, the human cost from domestic terrorist attacks, not just those by Muslims, hardly justifies crime blotter coverage.

Consider that in 2014, there were 33,559 deaths due to gun violence and 32 from terrorism.

But as I said, fear sells and fear of Muslims sells very well.

Terrorism or more properly the fear of terrorism has been exploited to distort government priorities and to reduce the rights of all citizens. Media participation/exploitation of that fear is a matter of record.

The question now is whether the media will knowingly continue its documented bigotry or choose another course?

No matter what media stream you depend on for news, you know that news has changed in the past few years. There’s a lot more of it, and it’s getting harder to tell what’s true, what’s biased, and what may be outright deceptive. While the bastions of journalism still employ editors and fact-checkers to screen information for you, if you’re getting your news and assessing information from less venerable sources, it’s up to you to determine what’s credible.

Wineburg and Sarah McGrew, a doctoral candidate in education, tested the ability of thousands of students ranging from middle school to college to evaluate the reliability of online news. What they found was discouraging: even social media-savvy students at elite universities were woefully unskilled at determining whether or not information came from reliable, unbiased sources.
…

Winburg and McGrew arrived at the crisis of “biased” news decades, if not centuries too late.

There is a documentary by Mark Achbar and Peter Wintonick about Noam Chomsky and Manufacturing Consent. Total run time is: 2 hours, 40 minutes and 24 seconds. I read the book, did not watch the video. But if you prefer video:

Herman and Chomsky don’t report some of the earlier examples of biased news.

Egyptian accounts of the Battle of Kadesh claim a decisive victory in 1274 or 1273 BCE over the Hittites, accounts long accepted as the literal truth. More recent research treats the Egyptian claims as akin to US claims to winning the war on terrorism.

Winning wars makes good press but no intelligent person takes such claims uncritically.

…Slate has created a new tool for internet users to identify, debunk, and—most importantly—combat the proliferation of bogus stories. Conceived and built by Slate developers, with input and oversight from Slate editors, it’s a Chrome browser extension called This Is Fake, and you can download and install it for free either on its home page or in the Chrome web store. The point isn’t just to flag fake news; you probably already know it when you see it. It’s to remind you that, anytime you see fake news in your feed, you have an opportunity to interrupt its viral transmission, both within your network and beyond.
…

In addition to winning the electoral college in a landslide, I won the popular vote if you deduct the millions of people who voted illegally.

That portion of the transcript reads as follows (apologies for the long quote but I think you will agree its all relevant):

…
STEPHANOPOULOS: As I said, President-Elect Trump has been quite active on Twitter, including this week at the beginning of this week, that tweet which I want to show right now, about the popular vote.

And he said, “In addition to winning the electoral college in a landslide, I won the popular vote if you deduct the millions of people who voted illegally.”

That claim is groundless. There’s no evidence to back it up.

Is it responsible for a president-elect to make false statements like that?

PENCE: Well, look, I think four years ago the Pew Research Center found that there were millions of inaccurate voter registrations.

STEPHANOPOULOS: Yes, but the author of this said he — he has said it is not any evidence about what happened in this election or any evidence of voter fraud.

PENCE: I think what, you know, what is — what is historic here is that our president-elect won 30 to 50 states, he won more counties than any candidate on our side since Ronald Reagan.

And the fact that some partisans, who are frustrated with the outcome of the election and disappointed with the outcome of the election, are pointing to the popular vote, I can assure you, if this had been about the popular vote, Donald Trump and I have been campaigning a whole lot more in Illinois and California and New York.

STEPHANOPOULOS: And no one is questioning your victory, certainly I’m not questioning your victory. I’m asking just about that tweet, which I want to say that he said he would have won the popular vote if you deduct the millions of people who voted illegally. That statement is false. Why is it responsible to make it?

PENCE: Well, I think the president-elect wants to call to attention the fact that there has been evidence over many years of…

STEPHANOPOULOS: That’s not what he said.

PENCE: …voter fraud. And expressing that reality Pew Research Center found evidence of that four years ago.

STEPHANPOULOS: That’s not the evidence…

PENCE: …that certainly his right.

But, you know…

STEPHANOPOULOS: It’s his right to make false statements?

PENCE: Well, it’s his right to express his opinion as president-elect of the United States.

I think one of the things that’s refreshing about our president-elect and one of the reasons why I think he made such an incredible connection with people all across this country is because he tells you what’s on his mind.

STEPHANOPOULOS: But why is it refreshing to make false statements?

PENCE: Look, I don’t know that that is a false statement, George, and neither do you. The simple fact is that…

STEPHANOPOULOS: I know there’s no evidence for it.

PENCE: There is evidence, historic evidence from the Pew Research Center of voter fraud that’s taken place. We’re in the process of investigating irregularities in the state of Indiana that were leading up to this election. The fact that voter fraud exists is…

STEPHANPOULOS: But can you provide any evidence — can you provide any evidence to back up that statement?

PENCE; Well, look, I think he’s expressed his opinion on that. And he’s entitled to express his opinion on that. And I think the American people — I think the American people find it very refreshing that they have a president who will tell them what’s on his mind. And I think the connection that he made in the course…

STEPHANOPOULOS: Whether it’s true or not?

PENCE: Well, they’re going to tell them — he’s going to say what he believes to be true and I know that he’s always going to speak in that way as president.
….

Just to be clear, I agree with Stepanopoulos and others who say there is no evidence of millions of illegal votes being cast in the 2016 presidential election.

After reading Stephanopoulos press Pence on this false statement by President-elect Trump, can you recall Stepanopoulos or another other major reporter pressing President Obama on his statements about terrorism, such as:

…
Tonight I want to talk with you about this tragedy, the broader threat of terrorism and how we can keep our country safe. The FBI is still gathering the facts about what happened in San Bernardino, but here’s what we know. The victims were brutally murdered and injured by one of their co-workers and his wife. So far, we have no evidence that the killers were directed by a terrorist organization overseas or that they were part of a broader conspiracy here at home. But it is clear that the two of them had gone down the dark path of radicalization, embracing a perverted interpretation of Islam that calls for war against America and the West. They had stockpiled assault weapons, ammunition, and pipe bombs.

So this was an act of terrorism designed to kill innocent people. Our nation has been at war with terrorists since Al Qaeda killed nearly 3,000 Americans on 9/11. In the process, we’ve hardened our defenses, from airports, to financial centers, to other critical infrastructure. Intelligence and law enforcement agencies have disrupted countless plots here and overseas and worked around the clock to keep us safe.

Over the last few years, however, the terrorist threat has evolved into a new phase. As we’ve become better at preventing complex multifaceted attacks like 9/11, terrorists turn to less complicated acts of violence like the mass shootings that are all too common in our society. It is this type of attack that we saw at Fort Hood in 2009, in Chattanooga earlier this year, and now in San Bernardino.

And as groups like ISIL grew stronger amidst the chaos of war in Iraq and then Syria, and as the Internet erases the distance between countries, we see growing efforts by terrorists to poison the minds of people like the Boston Marathon bombers and the San Bernardino killers.

Every U.S. presidential election attracts the world’s attention, and this year’s election will be no exception. The decision between the two major party candidates, Hillary Clinton and Donald Trump, is challenging for a number of voters; this choice is resulting in third-party candidates like Gary Johnson and Jill Stein collectively drawing double-digit support in some polls. Given the plethora of news stories about both Clinton and Trump, November 8 cannot come soon enough for many.

In the Age of Analytics, numerous websites exist to interpret and analyze the stream of data that floods the airwaves and newswires. Seemingly contradictory data challenges even the most seasoned analysts and pundits. Many of these websites also employ political spin and engender subtle or not-so-subtle political biases that, in some cases, color the interpretation of data to the left or right.

Undergraduate computer science students at the University of Illinois at Urbana-Champaign manage Election Analytics, a nonpartisan, easy-to-use website for anyone seeking an unbiased interpretation of polling data. Launched in 2008, the site fills voids in the national election forecasting landscape.

Election Analytics lets people see the current state of the election, free of any partisan biases or political innuendos. The methodologies used by Election Analytics include Bayesian statistics, which estimate the posterior distributions of the true proportion of voters that will vote for each candidate in each state, given both the available polling data and the states’ previous election results. Each poll is weighted based on its age and its size, providing a highly dynamic forecasting mechanism as Election Day approaches. Because winning a state translates into winning all the Electoral College votes for that state (with Nebraska and Maine using Congressional districts to allocate their Electoral College votes), winning by one vote or 100,000 votes results in the same outcome in the Electoral College race. Dynamic programming then uses the posterior probabilities to compile a probability mass function for the Electoral College votes. By design, Election Analytics cuts through the media chatter and focuses purely on data.
…

If you have ever taken a social science methodologies course then you know:

Election Analytics lets people see the current state of the election, free of any partisan biases or political innuendos.

is as false as anything uttered by any of the candidates seeking nomination and/or the office of the U.S. presidency since January 1, 2016.

It’s an annoying conceit when you realize that every poll is biased, however clean the subsequent number crunching of the numbers may be.

Bias one step removed isn’t the absence of bias, but the concealment of bias.

I’ve been writing about the work of Cathy “Mathbabe” O’Neil for years: she’s a radical data-scientist with a Harvard PhD in mathematics, who coined the term “Weapons of Math Destruction” to describe the ways that sloppy statistical modeling is punishing millions of people every day, and in more and more cases, destroying lives. Today, O’Neil brings her argument to print, with a fantastic, plainspoken, call to arms called (what else?) Weapons of Math Destruction.

Warning: If you read Weapons of Math Destruction, unlike executives who choose models based on their “gut,” or “instinct,” you may be charged with constructive knowledge of how you model discriminates against group X or Y.

If, like a typical Excel user, you can honestly say “I type in the numbers here and the output comes out there,” it’s going to be hard to prove any intent to discriminate.

You are no more responsible for a result than a pump handle is responsible for cholera.

Doctorow’s conclusion:

…
O’Neil’s book is a vital crash-course in the specialized kind of statistical knowledge we all need to interrogate the systems around us and demand better.

depends upon your definition of “better.”

“Better” depends on your goals or those of a client.

Yes?

PS: It is important to understand models/statistics/data so you can shape results to be your definition of “better.” But acknowledging all results are shaped. The critical question is “What shape do you want?”

…
Much has been made of the tech industry’s lack of women engineers and executives. But there’s a unique problem with homogeneity in AI. To teach computers about the world, researchers have to gather massive data sets of almost everything. To learn to identify flowers, you need to feed a computer tens of thousands of photos of flowers so that when it sees a photograph of a daffodil in poor light, it can draw on its experience and work out what it’s seeing.

If these data sets aren’t sufficiently broad, then companies can create AIs with biases. Speech recognition software with a data set that only contains people speaking in proper, stilted British English will have a hard time understanding the slang and diction of someone from an inner city in America. If everyone teaching computers to act like humans are men, then the machines will have a view of the world that’s narrow by default and, through the curation of data sets, possibly biased.

“I call it a sea of dudes,” said Margaret Mitchell, a researcher at Microsoft. Mitchell works on computer vision and language problems, and is a founding member—and only female researcher—of Microsoft’s “cognition” group. She estimates she’s worked with around 10 or so women over the past five years, and hundreds of men. “I do absolutely believe that gender has an effect on the types of questions that we ask,” she said. “You’re putting yourself in a position of myopia.”
…

Margaret Mitchell makes a pragmatic case for diversity int the workplace, at least if you want to avoid male biased AI.

Not that a diverse workplace results in an “unbiased” AI, it will be a biased AI that isn’t solely male biased.

It isn’t possible to escape bias because some person or persons has to score “correct” answers for an AI. The scoring process imparts to the AI being trained, the biases of its judge of correctness.

Unless someone wants to contend there are potential human judges without biases, I don’t see a way around imparting biases to AIs.

By being sensitive to evidence of biases, we can in some cases choose the biases we want an AI to possess, but an AI possessing no biases at all, isn’t possible.

AIs are, after all, our creations so it is only fair that they be made in our image, biases and all.

This ProPublica story by Julia Angwin, Jeff Larson, Surya Mattu and Lauren Kirchner, isn’t short but it is worth your time to not only read, but to download the data and test their analysis for yourself.

Especially if you have the mis-impression that algorithms can avoid bias. Or that clients will apply your analysis with the caution that it deserves.

Finding a bias in software, like finding a bug, is a good thing. But that’s just one, there is no estimate of how many others may exist.

And as you will find, clients may not remember your careful explanation of the limits to your work. Or apply it in ways you don’t anticipate.

Machine Bias – There’s software used across the country to predict future criminals. And it’s biased against blacks.

Here’s the first story to try to lure you deeper into this study:

ON A SPRING AFTERNOON IN 2014, Brisha Borden was running late to pick up her god-sister from school when she spotted an unlocked kid’s blue Huffy bicycle and a silver Razor scooter. Borden and a friend grabbed the bike and scooter and tried to ride them down the street in the Fort Lauderdale suburb of Coral Springs.

Just as the 18-year-old girls were realizing they were too big for the tiny conveyances — which belonged to a 6-year-old boy — a woman came running after them saying, “That’s my kid’s stuff.” Borden and her friend immediately dropped the bike and scooter and walked away.

But it was too late — a neighbor who witnessed the heist had already called the police. Borden and her friend were arrested and charged with burglary and petty theft for the items, which were valued at a total of $80.

Compare their crime with a similar one: The previous summer, 41-year-old Vernon Prater was picked up for shoplifting $86.35 worth of tools from a nearby Home Depot store.

Prater was the more seasoned criminal. He had already been convicted of armed robbery and attempted armed robbery, for which he served five years in prison, in addition to another armed robbery charge. Borden had a record, too, but it was for misdemeanors committed when she was a juvenile.

Yet something odd happened when Borden and Prater were booked into jail: A computer program spat out a score predicting the likelihood of each committing a future crime. Borden — who is black — was rated a high risk. Prater — who is white — was rated a low risk.

Two years later, we know the computer algorithm got it exactly backward. Borden has not been charged with any new crimes. Prater is serving an eight-year prison term for subsequently breaking into a warehouse and stealing thousands of dollars’ worth of electronics.
…

This analysis demonstrates that malice isn’t required for bias to damage lives. Whether the biases are in software, in its application, in the interpretation of its results, the end result is the same, damaged lives.

I don’t think bias in software is avoidable but here, here no one was even looking.

What role do you think budget justification/profit making played in that blindness to bias?

Internet search rankings have a significant impact on consumer choices, mainly because users trust and choose higher-ranked results more than lower-ranked results. Given the apparent power of search rankings, we asked whether they could be manipulated to alter the preferences of undecided voters in democratic elections. Here we report the results of five relevant double-blind, randomized controlled experiments, using a total of 4,556 undecided voters representing diverse demographic characteristics of the voting populations of the United States and India. The fifth experiment is especially notable in that it was conducted with eligible voters throughout India in the midst of India’s 2014 Lok Sabha elections just before the final votes were cast. The results of these experiments demonstrate that (i) biased search rankings can shift the voting preferences of undecided voters by 20% or more, (ii) the shift can be much higher in some demographic groups, and (iii) search ranking bias can be masked so that people show no awareness of the manipulation. We call this type of influence, which might be applicable to a variety of attitudes and beliefs, the search engine manipulation effect. Given that many elections are won by small margins, our results suggest that a search engine company has the power to influence the results of a substantial number of elections with impunity. The impact of such manipulations would be especially large in countries dominated by a single search engine company.

I’m not surprised by SEME (search engine manipulation effect).

Although I would probably be more neutral and say: Search Engine Impact on Voting.

Whether you consider one result or another as the result of “manipulation” is a matter of perspective. No search engine strives to delivery “false” information to users.

In the novel 1984, George Orwell imagines a society in which powerful but hidden forces subtly shape peoples’ perceptions of the truth. By changing words, the emphases put on them, and their presentation, the state is able to alter citizens’ beliefs and behaviors in ways of which they are unaware.

Now imagine today’s Internet search engines did just that kind of thing—that subtle biases in search engine results, introduced deliberately or accidentally, could tip elections unfairly toward one candidate or another, all without the knowledge of voters.

That may seem an unlikely scenario, but recent research suggests it is quite possible. Robert Epstein and Ronald E. Robertson, researchers at the American Institute for Behavioral Research and Technology, conducted experiments that showed the sequence of results from politically oriented search queries can affect how users vote, especially among undecided voters, and biased rankings of search results usually go undetected by users. The outcomes of close elections could result from the deliberate tweaking of search algorithms by search engine companies, and such manipulation would be extremely difficult to detect, the experiments suggest.
…

Gary’s post is a good supplement to the original article, covering some of the volunteers who are ready to defend the rest of us from biased search results.

Or as I would put it, to inject their biases into search results as opposed to other biases they perceive as being present.

If you are more comfortable describing the search results you want presented as “fair and equitable,” etc., please do so but I prefer the honesty of naming biases as such.

Or as David Bowie once said:

Make your desired bias, direction, etc., a requirement and allow data scientists to get about the business of conveying it.

Data scientists are problem solvers at heart, and we love our data and our algorithms that sometimes seem to work like magic, so we may be inclined to try to solve these problems stemming from human bias by turning the decisions over to machines. Most people seem to believe that machines are less biased and more pure in their decision-making – that the data tells the truth, that the machines won’t discriminate.

…

Renee’s post summarizes a lot of information about bias, inside and outside of data science and issues this challenge:

Data scientists, I challenge you. I challenge you to figure out how to make the systems you design as fair as possible.

An admirable sentiment but one hard part is defining “…as fair as possible.”

Data science relies on classification, which has as its avowed purpose the separation of items into different categories. Some categories will be treated differently than others. Otherwise there would be no reason to perform the classification.

Another hard part is that employers of data scientists are more likely to say:

Analyze data X for market segments responding to ad campaign Y.

As opposed to:

What do you think about our ads targeting tweens by the use of sexual-content for our unhealthy product A?

Or change the questions to fit those asked of data scientists at any government intelligence agency.

The vast majority of data scientists are hired as data scientists, not amateur theologians.

Competence in data science has no demonstrable relationship to competence in ethics, fairness, morality, etc. Data scientists can have opinions about the same but shouldn’t presume to poach on other areas of expertise.

How you would feel if a competent user of spreadsheets decided to label themselves a “data scientist?”

Keep that in mind the next time someone starts to pontificate on “ethics” in data science.

PS: Renee is in the process of creating and assembling high quality resources for anyone interested in data science. Be sure to explore her blog and other links after reading her post.

The current infestation of incompetents at the Office of Personnel Management is absolutely convinced, judging from their responses to their Inspector General reports urging modern project management practices, that no change is necessary.

Personally I would fire everyone from the elevator operator (I’m sure they probably still have one) to the top and terminal all retirement and health benefits. Would not cure the technology problems at OPM but would provide the opportunity to have a fresh start at addressing it.

Cognitive biases, self-interest and support of other incompetents, doom reform at the OPM. You may as well wish upon a star.

Software may appear to operate without bias because it strictly uses computer code to reach conclusions. That’s why many companies use algorithms to help weed out job applicants when hiring for a new position.

But a team of computer scientists from the University of Utah, University of Arizona and Haverford College in Pennsylvania have discovered a way to find out if an algorithm used for hiring decisions, loan approvals and comparably weighty tasks could be biased like a human being.

The researchers, led by Suresh Venkatasubramanian, an associate professor in the University of Utah’s School of Computing, have discovered a technique to determine if such software programs discriminate unintentionally and violate the legal standards for fair access to employment, housing and other opportunities. The team also has determined a method to fix these potentially troubled algorithms.

Venkatasubramanian presented his findings Aug. 12 at the 21st Association for Computing Machinery’s Conference on Knowledge Discovery and Data Mining in Sydney, Australia.

“There’s a growing industry around doing resume filtering and resume scanning to look for job applicants, so there is definitely interest in this,” says Venkatasubramanian. “If there are structural aspects of the testing process that would discriminate against one community just because of the nature of that community, that is unfair.”
…

It’s a puff piece and therefore misses that all algorithms are biased, but some algorithms are biased in ways not permitted under current law.

The abstract for the paper does a much better job of setting the context for this research:

What does it mean for an algorithm to be biased? In U.S. law, unintentional bias is encoded via disparate impact, which occurs when a selection process has widely different outcomes for different groups, even as it appears to be neutral. This legal determination hinges on a definition of a protected class (ethnicity, gender, religious practice) and an explicit description of the process.

When the process is implemented using computers, determining disparate impact (and hence bias) is harder. It might not be possible to disclose the process. In addition, even if the process is open, it might be hard to elucidate in a legal setting how the algorithm makes its decisions. Instead of requiring access to the algorithm, we propose making inferences based on the data the algorithm uses.

We make four contributions to this problem. First, we link the legal notion of disparate impact to a measure of classification accuracy that while known, has received relatively little attention. Second, we propose a test for disparate impact based on analyzing the information leakage of the protected class from the other data attributes. Third, we describe methods by which data might be made unbiased. Finally, we present empirical evidence supporting the effectiveness of our test for disparate impact and our approach for both masking bias and preserving relevant information in the data. Interestingly, our approach resembles some actual selection practices that have recently received legal scrutiny.

If you are a bank, you want a loan algorithm to be biased against people with a poor history of paying their debts. The distinction being that is a legitimate basis for discrimination among loan applicants.

The lesson here is that all algorithms are biased, the question is whether the bias is in your favor or not.

Suggestion: Only bet when using your own dice (algorithm).

Posted in Algorithms, Bias | Comments Off on Non-News: Algorithms Are Biased

“We’re watching you.” This was the warning that the Chicago Police Department gave to more than 400 people on its “Heat List.” The list, an attempt to identify the people most likely to commit violent crime in the city, was created with a predictive algorithm that focused on factors including, per the Chicago Tribune, “his or her acquaintances and their arrest histories – and whether any of those associates have been shot in the past.”

Algorithms like this obviously raise some uncomfortable questions. Who is on this list and why? Does it take race, gender, education and other personal factors into account? When the prison population of America is overwhelmingly Black and Latino males, would an algorithm based on relationships disproportionately target young men of color?

There are many reasons why such algorithms are of interest, but the rewards are inseparable from the risks. Humans are biased, and the biases we encode into machines are then scaled and automated. This is not inherently bad (or good), but it raises the question: how do we operate in a world increasingly consumed with “personal analytics” that can predict race, religion, gender, age, sexual orientation, health status and much more.
…

Jason’s post is a refreshing step back from the usual “machine learning isn’t biased like people are,” sort of stance.

Of course machine learning is biased, always biased. The algorithms are biased themselves, to say nothing of the programmers who inexactly converted those algorithms into code. It would not be much of an algorithm if it could not vary its results based on its inputs. That’s discrimination no matter how you look at it.

The difference, at least in some cases, is that discrimination is acceptable in some cases and not others. One imagines that only women are eligible for birth control pill prescriptions. That’s a reasonable discrimination. Other bases for discrimination, not so much.

And machine learning is further biased by the data we choose to input to the already biased implementation of a biased algorithm.

That isn’t a knock on machine learning but a caveat when confronted with a machine learning result, look behind the result to the data, the implementation of the algorithm and the algorithm itself before taking serious action based on the result.

Of course, the first question I would ask is: “Why is this person showing me this result and want do they expect me to do based on it?”

That they are trying to help me on my path to becoming self-actualized isn’t my first reaction.

Social media like Facebook and Twitter are far too biased to be used blindly by social science researchers, two computer scientists have warned.

Writing in today’s issue of Science, Carnegie Mellon’s Juergen Pfeffer and McGill’s Derek Ruths have warned that scientists are treating the wealth of data gathered by social networks as a goldmine of what people are thinking – but frequently they aren’t correcting for inherent biases in the dataset.

If folks didn’t already know that scientists were turning to social media for easy access to the pat statistics on thousands of people, they found out about it when Facebook allowed researchers to adjust users’ news feeds to manipulate their emotions.

…

Both Facebook and Twitter are such rich sources for heart pounding headlines that I’m shocked, shocked that anyone would suggest there is bias in the data! 😉

Not surprisingly, people participate in social media for reasons entirely of their own and quite unrelated to the interests or needs of researchers. Particular types of social media attract different demographics than other types. I’m not sure how you could “correct” for those biases, unless you wanted to collect better data for yourself.

There is a sentence I have heard or read multiple times in my journey into (academic) visualization: visualization is a tool people use when they don’t know what question to ask to their data.

I have always taken this sentence as a given and accepted it as it is. Good, I thought, we have a tool to help people come up with questions when they have no idea what to do with their data. Isn’t that great? It sounded right or at least cool.

But as soon as I started working on more applied projects, with real people, real problems, real data they care about, I discovered this all excitement for data exploration is just not there. People working with data are not excited about “playing” with data, they are excited about solving problems. Real problems. And real problems have questions attached, not just curiosity. There’s simply nothing like undirected data exploration in the real world.

I think Enrico misses the reason why people use/like the phrase: visualization is a tool people use when they don’t know what question to ask to their data.

Visualization privileges the “data” as the source of whatever result is displayed by the visualization.

It’s not me! That’s what the data says!

Hardly. Someone collected the data. Not at random, stuffing whatever bits came along in a bag. Someone cleaned the data with some notion of what “clean” meant. Someone choose the data that is now being called upon for a visualization. And those are clumsy steps that collapse many distinct steps into only three.

To put it another way, data never exists without choices being made. And it is the sum of those choices that influence the visualizations that are even possible from some data set.

I would recast his title to read: The myth of the objective data explorer.

Having said that, I don’t mean that all bias is bad.

If I were collecting data on Ancient Near Eastern (ANE) languages, I would of necessity be excluding the language traditions of the entire Western Hemisphere. It could even be that data from the native cultures of the Western Hemisphere will be lost while I am preserving data from the ANE.

So we have bias and a bad outcome, from someone’s point of view because of that bias. Was that a bad thing? I would argue not.

It isn’t every possible to collect all the potential data that can be collected. We all make values judgments about the data we choose to collect and what we choose to ignore.

Rather than pretending that we possess objectivity in any meaningful sense, we are better off to state our biases to the extent we know them. At least others will be forewarned that we are just like them.

Posted in Bias, Data | Comments Off on The myth of the aimless data explorer

Evidence in experimental psychology suggests that most people overestimate their own ability to complete objective tasks accurately. This phenomenon, often called confidence bias, refers to “a systematic error of judgment made by individuals when they assess the correctness of their responses to questions related to intellectual or perceptual problems.” 1 But does this hold up in crowdsourcing?

We ran an experiment to test for a persistent difference between people’s perceptions of their own accuracy and their actual objective accuracy. We used a set of standardized questions, focusing on the Verbal and Math sections of a common standardized test. For the 829 individuals who answered more than 10 of these questions, we asked for the correct answer as well as an indication of how confident they were of the answer they supplied.

We didn’t use any Gold in this experiment. Instead, we incentivized performance by rewarding those finishing in the top 10%, based on objective accuracy.

I am not sure why crowdsourcing would make a difference on the question of overestimation of ability but now the answer is in, N0. But do read the post for the details, I think you will find it useful when doing user studies.

For example, when you ask a user if some task is too complex as designed, are they likely to overestimate their ability to complete it, either to avoid being embarrassed in front of others or admitting that they really didn’t follow your explanation?

My suspicion is yes and so in addition to simply asking users if they understand particular search or other functions with an interface, you need to also film them using the interface with no help from you (or others).

You will remember in Size Really Does Matter… that Blair and Maron reported that lawyers over estimated their accuracy in document retrieval by 55%. Of course, the question of retrieval is harder to evaluate than those in the Crowdflower experiment but it is a bias you need to keep in mind.