Monday, 2 November 2015

Big
data is a perfect representation of the difficulty in governing innovation. A
complex web of technologies creates a seemingly endless chain of questions that
require regulatory attention, if not answers. How should personal data be
collected? What should be collected? How much consent should be required for
the collection? How much of that consent should be based on knowing the end-use
of the data? Do the companies who collect the data even understand how it is
used? Do the people who wrote the algorithms which analyze it even know?

source: http://pacman-ghost-loveslave.tumblr.com/

This
last question has become particularly interesting and difficult to answer as
machine learning’s ability ability to process big data removes the requirement
of explicitly programming the decision of what to do with the information.
Rather, only an objective function (which, in the case of the private sector,
is typically profit) and a method programmed to optimize that function are
required. This is referred to as a "black box": things go into it, things come out of it, but the transmutation itself is inscrutable.

Facebook
made a few
headlines when it patented an algorithm which could allow lenders, when
looking at someone’s credit score, to also look at the scores of those in their
friend network on the platform, subsequently bolstering—or lowering—it. This
was not even particularly new technology – Facebook filed for the patent in
2012 but it went unnoticed until this year. Similarly, the
Chicago Police Department created a list of 400 people who were considered
high-risk for committing a violent crime from algorithms based on a Yale
sociologist’s work. ProPublica
found that SAT tutoring packages offered by the Princeton Review resulted
in higher prices being charged to Asian markets based on geography variables
entered into its pricing model. There are many other examples of how big data
inadvertently discriminates on the basis of gender, race, socio-economic
background. Some of the ways in which it manifests itself initially appear
somewhat benign, such
as 11% of the Google Image search results for “C.E.O” being a picture of woman,
but which provides an unintentionally misleading portrait of the position when
you consider that 27% of C.E.Os in the US are women.

Governments
appear to be aware of the need for some oversight, particularly since many of
these groups fall under legal protection in these situations. The Executive
Office of the President of the United States put out a document
entitled “Big Data: Seizing Opportunities, Preserving Values” in May of
2014. In it, the advisory committee tasked with examining the effects of big
data on the American way of life wrote: “The increasing use of algorithms to
make eligibility decisions must be carefully monitored for potential
discriminatory outcomes for disadvantaged groups, even absent discriminatory
intent” (pg 47). If we agree that algorithms which discriminate need some form
of oversight, the next question, naturally, is how?

A
group of computer scientists* focused on discriminatory algorithms have
proposed one method to address them in the US. Their paper examines the
algorithms from a legal framework using a theory of US anti-discrimination law
called disparate
impact. To summarize a highly complex piece of legal theory in the briefest
way possible: disparate impact is used to guard against unintended
discrimination against a protected class, such as race (protected classes are
defined by statutes). Disparate impact causes a protected class to experience
an adverse effect; again, this is not an intentional effect to harm a protected
class but an indirect outcome of some policy or in the age of big data, an
algorithm. (Additionally, in order to actually be illegal, the discrimination
caused by disparate impact must not be a provable, necessary requirement in the
context in which it occurs). ProPublica
also has a good explanation of the legalities of disparate impact in the
context of new technology.

The
authors put forth a mathematical way to determine how well an algorithm can
predict one of these protected classes (“protected attribute”) based on the
data it uses (other “attributes”). They then introduce methods of transforming
datasets so that the algorithm in question cannot predict the protected
attribute, while maintaining the other data necessary for the algorithm to function
to an acceptable degree of accuracy. These methods are then put into practice
on real-life datasets from actual disparate impact cases.This has the
potential to be a particularly effective approach because neither the test to
determine whether an algorithm has the potential to cause disparate impact, nor
the remedy to prevent it from doing so, relies on access to the algorithm
itself (which tend to be proprietary and therefore difficult to obtain).
Instead, it focuses on the dataset used by the algorithm. The authors note that
this paper explores only numerical attributes, and thus other types of
attributes (eg categorical) may prove more of a challenge.

Discrimination
by credit lenders, law enforcement and employers is not new, but big data has,
intentionally or not, enabled new ways to obfuscate it, hidden in rows of code,
and new rows of code may be the only way to catch it.

Wearable technology, enabling the tracking and
logging of human functionality from heart rate to distance ran has a vast array
of implementations and implications. Certainly much has been made about the
rise of self-tracking and the quantification and gamification of everyday
activities and vitals towards some goal: better sleep, losing weight, improved
sex life, etc. No longer constrained by high costs of data storage, the
applications for the quantification and collection of human behaviour and
biological functions seem endless. One of the most obvious and practical
applications of wearable technology is in the workplace as an employee
monitoring tool. This usually falls under two main streams of justification:
health and safety (both the employee’s and others’), and productivity, and has
been employed in the financial services sector, by healthcare providers, and in
grocery warehouses, to name a few.

One example among many is the use of monitoring technology to assess the
fatigue of drivers and heavy machinery operators in industries such as mining. While
actual statistics for the percentage of accidents in industry caused by fatigue
are difficult to estimate, the perception of fatigue as a critical factor in
workplace accidents has been found to be quite strong among workers. Avoiding these types of accidents is ostensibly the
objective of fatigue management systems, a mix of hardware and softwaredesigned to 1) predict the onset of fatigue before
it occurs, 2) determine when fatigue has set in, and 3) be able to do something
about it. In their paper The challenges and opportunities of technological approaches to fatigue management,Balkin et al position technology is seen as an
objective way of determining and even predicting driver fatigue:

With such technologies, current work- place rules
and regulations designed to afford workers ample time for sleep and recovery
could instead be rewritten to require that operators maintain adequate levels
of objectively monitored alertness/performance on the job—a change in
emphasis that would more directly address the issues of ultimate concern.
(pg 2)

A fatigue management system could combine a fitness-for-duty
screening before a driver begins their shift, and a personalized predictive
model which utilizes real-time driver measures from an online monitoring system
as well as environmental factors (eg time of day) and specifics about the job
itself.
Given the physiological elements involved in how an
individual functions under fatigue, the current technology is far from perfect. System developers are thus continuously trying to
achieve the best balance between false positives and fatigue identification
failure. Each element of the system has reliability issues as well as
implementation concerns. For instance, stimulants could be used to pass a
fit-for-duty test, electroencephalography (EEG, which measure brain wave
activity) requires electrodes stuck on one’s head, and less intrusive systems
of eye movement (oculomotor) monitoring tend to be less reliable. And while some of these detection systems are
more accurate than others, they tend not to facilitate actual intervention in
the case of fatigue (what an intervention should look like is another topic of
much discussion).

In a 2008 report on operator fatigue detection
technologies by the heavy equipment company CATERPILLAR,
22 technologies were assessed based on number of metrics; only
3 of the highest rated were actually commercially available at the time,
however. Both fatigue research specialists and representatives from a large
mining company were asked to assess the importance of each of the metrics, and
the differences between what each group prioritized is a good argument for
multidisciplinary assessment of the technology. Whereas the research
specialists placed importance on how
the fatigue detection technology worked, the mining experts only cared about what the technology was measuring and
operator acceptance, including how intrusive actual implementation was, how
easy the technology would be to manipulate, and how accepting miners and unions
were.

Of course, even if we agreed that heart rate
measurements were objective, and that the technology used to capture them
accurately was available, the act of implementing that technology is far from
objective. While initially employed as a safety mechanism, what guarantee is
there against scope-creep—that
is, using the technology to monitor and provide feedback on productivity? How then does the constant monitoring of vital
signs and eye blinks actually affect the worker, their relationship with themselves, and their job?

Moore and Robinson provide some insight into this question by looking at "the quantified self"—originally coined in response to the increase in tracking
technologies in our lives—in
the context of the workplace, a feature which they ascribe to "neoliberalism
as an affective regime exposing a risk of assumed subordination of bodies to
technologies" (pg 3). The authors posit that a sense of disposability
permeates the modern workplace, true for those in both white-collar and
blue-collar jobs.
Marx’s theory of worker alienation resulting
from assembly line production starts to look quite quaint compared to the monitoring
and reduction of employee performance down to measures of basic biology.

Another
critical perspective is that these monitoring technologies take the place of trust
which can be seen as stemming from respect. Monitoring is an admission of lack
of trust, and so respect is lost too by extension. This then results in a
continuous feedback loop, where further monitoring is then necessary to make up
for the loss of trust, resulting in even less trust, and so on. This could also result in workers feeling less of a
need for self-trust, or as Balkin et al put it: "Over-reliance
would in such cases reflect an inflated trust in the reliability of the system
relative to its actual reliability" (pg 570).
On the
other hand, a certain level of trust is needed for the system to
be effective, and technology with a high rate of false positives could lead to
workers simply ignoring its feedback (putting aside the fact that a fatigued
driver may be more likely to overestimate their abilities and disagree with the
system’s assessment anyway). Others are optimistic that buy-in for monitoring devices might be greater among the younger generations as they are already used to carrying around portable technologies. Although the inertia of the development of these
technologies makes their implementation in almost every facet of life
inevitable, it is worth examining how our current world has shaped our
acceptance of them, and how they have shaped our world.

Monday, 9 December 2013

In Open Access, Accountability, and For-Profit Publishing I wrote about the publisher Elsevier and its continual efforts to restrict access to published research by locking it behind expensive subscriptions, without any explanation as to why access was so prohibitively expensive in the first place (besides making $$$). I mentioned that authors who publish in Elsevier's publications can choose to make their articles openly accessible to everyone, but that

Elsevier's revenue model currently dissuades researchers from sharing by charging authors who wish to make their work open access $3,000 per article (the actual amount varies depending on the journal - it's £400 per page in The Lancet and $5,000 for Cell Press titles).

However, it's been common practice among researchers to disregard this technicality and publish their papers on their own websites as well - free of charge. And Elsevier's legal team must be extra restless right now, because they served up a bunch of take-down notices, nicely summed up over at Sauropod Vertebra Picture of the Week:

Preventing people from making their own work available would be insane, and the publisher that did it would be committing a PR gaffe of huge proportions.Enter Elsevier, stage left. Bioinformatician Guy Leonard is just one of several people to have mentioned on Twitter this morning that Academia.edu took down their papers in response to a notice from Elsevier.

Academia.edu explained its actions with the following notification, laying the responsibility squarely and fairly at the feet of Elsevier:

Hi Guy

Unfortunately, we had to remove your paper, Resolving the question of trypanosome monophyly: a comparative genomics approach using whole genome data sets with low taxon sampling, due to a take-down notice from Elsevier.

Academia.edu is committed to enabling the transition to a world where there is open access to academic literature. Elsevier takes a different view, and is currently upping the ante in its opposition to academics sharing their own papers online.

Over the last year, more than 13,000 professors have signed a petition voicing displeasure at Elsevier’s business practices at www.thecostofknowledge.com. If you have any comments or thoughts, we would be glad to hear them.

The Academia.edu Team

So, there. The battle lines were contractually drawn a long time ago, and big publishing is simply entrenching itself further.

Big thanks to Mike Taylor and SV-POW for bringing this to our attention

Wednesday, 4 September 2013

Peak Oil Is Dead, Long Live Peak Oil

Less than a month ago, geologist Euan Mearns wrote a piece called "Three Nails in the Coffin of Peak Oil". The article was posted on The Oil Drum, a website that has been a source of peak oil information and debate for nearly a decade, but is now being mothballed and indefinitely put to rest.

This is emblematic of the state of peak oil today. The idea that a peak in oil production rates is imminent seems to appear much less in the media now than during the oil price spike of 2005-2008, and the graph below suggests that public interest in it is waning as well.

When peak oil does appear in the media these days, it is often dismissed or outright ridiculed. A widely cited report from Harvard University put the ivory seal of academia on peak oil's death sentence. The June 2012 report, which can be read in full here, contends that global oil production will rise until 2020 at rates that have not been seen since the 1980s. Most of this growth is attributed to an increase in shale/tight oil production, especially in North America, where Montana and North Dakota could become "a big Persian Gulf producing country within the United States".

Taking a slightly different angle, The Economist recently argued that the world will soon figure out out how to reduce its oil dependency through a mix of fuel efficiency improvements and switching to newly abundant natural gas, meaning a peak in demand, rather than supply, is expected. There have even been calls to shut down the U.S's Strategic Petroleum Reserve, a massive oil storage facility put in place after the 1973-1974 embargo, implying that the kind of oil worries started in the '70s have been put to rest once and for all.

This lack of concern with oil supply constraints is generally regarded as good news, although taken from another perspective, the idea that we might manage to keep up (or even increase) the rate at which we're burning oil for decades to come is perhaps not particularly appealing. Some of those who reject the peak oil hypothesis aren't thrilled about it either. One of the most prominent names to have switched sides in the debate, journalist George Monbiot, wrote a piece shortly after the Harvard report came out called "We were wrong on peak oil. There's enough to fry us all". However, while the more severe effects of climate change are still decades away according to most predictions, a dip in the supply of oil could have immediate and far-reaching repercussions: about 95% of global transportation energy is petroleum-based, meaning pretty much everything you eat or use has been moved around using oil at some point in its life. Personally, with my parents living more than 7000 km away from me, any decrease in the availability of affordable transportation would have a significant impact on my life. As a species with a relatively-short lifespan, it's not hard to guess how far on the horizon our priorities are going to lie.

Of course, all this pre-supposes that the combination of science and speculation behind this production/demand optimism is accurate. Look for another post coming down the pipeline which critically examines what, exactly, is lending credence to these reports.

Thursday, 2 May 2013

Temporary Respite from the War on Science

via SaveELA.org

The Experimental Lakes Area (ELA) is a series of 58 lakes and a research facility in northwestern Ontario where scientists study the effects of pollutants and other stressors in a naturally-occurring ecosystem. In other words, it's a literal massive wet-lab. The world-class testing facility, unique to Canada, allows scientists to introduce chemicals into entire lakes monitor the effects in a natural ecosystem, rather than in a smaller artificial lab setting. Some of the globally recognized work to come out of the ELA includes the investigation of algae blooms which led to the ban of high-phosphate laundry detergents and phosphorous use at sewage treatment plants in the 1970s.
Last summer, however, the federal government announced it would no longer be funding the facility, effective March 31, 2013.

The reasons for the decision were never clearly articulated, the usual fall guy of budget cuts being touted, alongside claims that the ELA, which was under the operational umbrella of the Department of Fisheries and Oceans, no longer fell under the department's mandate. Since then there has been rampant speculation about the fate of the ELA, and at one point it was being sold to the UN's International Institute for Sustainable Development.

It would cost the federal government $2 million a year to continue funding the ELA, with an additional $600,000 in operation costs, but as MP for the area Greg Rickford concedes, this really isn't about the money:

“We do intend to withdraw our role in the ELA. The motivation for that is because the federal government must have flexibility to move certain types of research, including some that has gone at the ELA, to other parts of the country where there is the potential for monitoring new environmental factors that are more proximal to resource development in Western Canada. That research is continuing, it’s just moving to other areas of the country where it’s required.”

As it turns out, this bullshit response can be refuted by the very history of the Experimental Lakes Area itself. In the 1970s, the acid rain research conducted at the ELA was initially funded by the Government of Alberta's Oil Sands Research Program, with the intent of investigating the long-term impacts of developing the oil sands. They were able to secure money for the first three years of the research from the program because, lo and behold, "their freshwater ecosystems had a lot of the same species that we did at ELA and their most sensitive lakes were like ours" (from a great interview with the renowned scientist David Schindler, former ELA director).

It seemed that efforts to save the area had failed, when last week the Government of Ontario announced that it would work with the Government of Manitoba, the federal government, and "other partners" to keep the ELA operational for the rest of 2013. It's still unclear what the long term plans are, but as of right now Ontario seems to willing to cover operating costs, at least for now. It's a temporary victory, but a relief for the scientists who currently have federal grant money to study the effects of nanosilver on lakes - $800,000 to be exact, which perhaps the federal government would have been fine writing off as a cost of war.

Conservatives Vote "No" to Science

The mounting frustration regarding the federal government's handling of the ELA, and science in general, reached its apex this March with the following vote put forward during Parliament, which is now in the midst of federal budget discussions. From Vote No. 631 on March 20, 2013:

That, in the opinion of the House: (a) public science, basic research and the free and open exchange of scientific information are essential to evidence-based policy-making; (b) federal government scientists must be enabled to discuss openly their findings with their colleagues and the public; and (c) the federal government should maintain support for its basic scientific capacity across Canada, including immediately extending funding, until a new operator is found, to the world-renowned Experimental Lakes Area Research Facility to pursue its unique research program.

The vote, sponsored by NDP MP Kennedy Stewart, is a particularly loaded piece of political savvy. By tacking on the call for the government to continue its support for the ELA, Stewart guaranteed its defeat at the hands of the 157 Conservatives in Parliament. In short, the Government of Canada voted that it is against science, a very catchy soundbite which fits perfectly into 140 characters and makes for a great screen cap. Of course, this is accomplished through a bit of circular logic not uncommon to political rhetoric. But in light of the policy pushed through by this government, the vote is not an inaccurate representation of the prevailing attitudes of the current government at all. The Unmuzzled Science has a brief takedown of how the Conservatives have opposed each point present in the motion.

Not Scientists, Just Doctors of Spin
The advocacy group Democracy Watch released a report in January of this year, alleging that the Government of Canada has been systematically limiting access to government information, specifically federal scientists from the departments of Environment, Fisheries and Oceans, and Natural Resources.Spin and the restriction of information is, of course, not unique to this government, but the unabashed, unapologetic way in which the Harper Government does so is downright insulting. Democracy Watch has subsequently filed a complaint along with the University of Victoria's Environmental Law Centre to the Information Commissioner of Canada, asking for an investigation into the government's obstruction of information. Until then, here's to hoping that public backlash will continue the stay of execution of further campaigns against science.

Tuesday, 23 April 2013

Or, The Short Version of Why Open Access Matters

In Open Access, Accountability, and For-Profit publishing, I wrote about how the lack of model transparency and access to data prevented researchers from replicating others' results, leading to an over-reliance on the peer-review process to vouch for a piece of research's validity. I also mentioned how peer-reviewers are not required to audit the veracity of a model or verify its results, and that this lack of oversight could result in inaccurate information influencing policymakers.

This has been one of the most cited stats in the public debate during the Great Recession. Paul Ryan's Path to Prosperity budget states their study "found conclusive empirical evidence that [debt] exceeding 90 percent of the economy has a significant negative effect on economic growth." The Washington Post editorial board takes it as an economic consensus view, stating that "debt-to-GDP could keep rising — and stick dangerously near the 90 percent mark that economists regard as a threat to sustainable economic growth."

Initially it would seem that access to data isn't an issue, as Reinhart and Rogoff provide the historical data they used along with their sources on their website. But publicly-available data is meaningless without context, and the authors don't provide any clarity on which dataseries and methodology they used. This is why model access is equally important to experiment repeatability and accountability. (There is some discussion on whether or not "Growth in a Time of Debt" was ever actually peer-reviewed, but as noted above, there's a good chance it would have passed review without anyone ever having to look at the model and datasets anyway.)

We were unable to replicate the RR results from the publicly available country spreadsheet data although our initial results from the publicly available data closely resemble the results we ultimately present as correct. Reinhart and Rogoff kindly provided us with the working spreadsheet from the RR analysis. With the working spreadsheet, we were able to approximate losely the published RR results. While using RR's working spreadsheet, we identifi ed coding errors, selective exclusion of available data, and unconventional weighting of summary statistics.

Reinhart and Rogoff's work demonstrated a seemingly straight-forward, data-driven analysis linking causation from a country's public debt to GDP growth, which is great talking-head bait and support for austerity economics. But, as succinctly put by Herndon, Ash and Pollin, "A necessary condition for a stylized fact is accuracy".