Tag Archives: data mining

Consider the various image-sharing databases online: Facebook’s photo stores, Instagram, Flickr. These contain trillions of photographs, petabytes of fragile digital data, growing daily, without limit; every day, millions of users worldwide upload the images they capture on their phones and cameras to the cloud, there to be stored, processed, enhanced, shared, tagged, commented on. And to be used as learning data for facial recognition software–the stuff that identifies your ‘friends’ in your photos in case you want to tag them.

This gigantic corpus of data is a mere court-issued order away from being used by the nation’s law enforcement agencies to train their own facial surveillance software–to be used, for instance, in public space cameras, port-of-entry checks, correctional facilities, prisons etc. (FISA courts can be relied upon to issue warrants in response to any law enforcement agency requests; and internet service providers and media companies respond with great alacrity to government subpoenas.) Openly used and deployed, that is. With probability one, the NSA, FBI, and CIA have already ‘scraped’, using a variety of methods, these image data stores, and used them in the manner indicated. We have actively participated and collaborated, and continue to do so, in the construction of the world’s largest and most sophisticated image surveillance system. We supply the data by which we may be identified; those who want to track our movements and locations use this data to ‘train’ their artificial agents to surveil us, to report on us if we misbehave, trespass, or don’t conform to whichever spatial or physical or legal or ‘normative’ constraint happens to direct us at any given instant. The ‘eye’ watches; it relies for its accuracy on what we have ‘told’ it, through our images and photographs.

Now imagine a hacktivist programmer who writes a Trojan horse that infiltrates such photo stores and destroys all their data–permanently, for backups are also taken out. This is a ‘feat’ that is certainly technically possible; encryption will not prevent a drive from being formatted; and security measures of all kinds can be breached. Such an act of ‘hacktivism’ would be destructive; it would cause the loss of much ‘precious data’: memories and recollections of lives and the people who live them, all gone, irreplaceable. Such an act of destruction would be justified, presumably, on the grounds that to do so would be to cripple a pernicious system of surveillance and control. Remember that your photos don’t train image recognition systems to recognize just you; they also train it to not recognize someone else as you; our collaboration does not just hurt us, it hurts others; we are complicit in the surveillance and control of others.

I paint this admittedly unlikely scenario to point attention to a few interesting features of our data collection and analysis landscape: a) we participate, by conscious action and political apathy, in the construction and maintenance of our own policing; b) we are asymmetrically exposed because our surveillers enjoy maximal secrecy while we can draw on none; c) collective, organized resistance is so difficult to generate that the most effective political action might be a quasi-nihilist act of loner ‘civil disobedience’–if you do not cease and desist from ‘collaborating,’ the only choice left to others still concerned about their freedom from surveillance might to be nonconsensually interrupt such collaboration.

On Thursday night, in the course of conversation with some of my Brooklyn College colleagues, I confessed to having internalized a peculiar sort of ‘chilling effect’ induced by a heightened sensitivity to our modern surveillance state. To wit, I said something along the lines of “I would love to travel to Iran and Pakistan, but I’m a little apprehensive about the increased scrutiny that would result.” When pressed to clarify by my companions, I made some rambling remarks that roughly amounted to the following. Travel to Iran and Pakistan–Islamic nations highly implicated in various foreign policy imbroglios with the US and often accused of supporting terrorism–is highly monitored by national law enforcement and intelligence agencies (the FBI, CIA, NSA); I expected to encounter some uncomfortable moments on my arrival back in the US thanks to questioning by customs and immigration officers (with a first name like mine–which is not Arabic in origin but is in point of fact, a very common and popular name in the Middle East–I would expect nothing less). Moreover, given the data point that my wife is Muslim, I would expect such attention to be heightened (data mining algorithms would establish a ‘networked’ connection between us and given my wife’s own problems when flying, I would expect such a connection to possibly be more ‘suspicious’) ; thereafter, I could expect increased scrutiny every time I traveled (and perhaps in other walks of life, given the extent of data sharing between various governmental agencies).

It is quite possible that all of the above sounds extremely paranoid and misinformed, and my worries a little silly, but I do not think there are no glimmers of truth in there. The technical details are not too wildly off the mark; the increased scrutiny after travel is a common occurrence for many travelers deemed ‘suspicious’ for unknown reasons; and so on. The net result is a curious sort of self-policing on my part: as I look to make travel plans for the future I will, with varying degrees of self-awareness about my motivations, prefer other destinations and locales. I will have allowed myself to be subject to an invisible set of constraints not actually experienced (except indirectly, in part, as in my wife’s experiences when flying.)

This sort of ‘indirect control’ might be pervasive surveillance’s most pernicious effect.

Note: My desire to travel to Iran and Pakistan is grounded in some fairly straightforward desires: Iran is a fascinating and complex country, host to an age-old civilization, with a rich culture and a thriving intellectual and academic scene; Pakistan is of obvious interest to someone born in India, but even more so to someone whose ethnic background is Punjabi, for part of the partitioned Punjab is now in Pakistan (as I noted in an earlier post about my ‘passing for Pakistani,’ “my father’s side of the family hails from a little village–now a middling town–called Dilawar Cheema, now in Pakistan, in Gujranwala District, Tehsil Wazirabad, in the former West Punjab.”)

Last Friday (July 31st) my wife, my daughter, and I were to fly back from Vancouver to New York City after our vacation in Canada’s Jasper and Banff National Parks. On arrival at Vancouver Airport, we began the usual check-in, got groped in security, and filled out customs forms. The US conducts all customs and passport checks in Canada itself for US-bound passengers; we waited in the line for US citizens. We were directed to a self-help kiosk, which issued a boarding pass for my wife with a black cross across it. I paid no attention to it at the time, but a few minutes later, when a US Customs and Border Protection officer directed us to follow him, I began to. We were directed to a waiting room, where I noticed a Muslim family–most probably from Indonesia or Malaysia–seated on benches. (The women wore headscarves; the man sported a beard but no moustache and wore a skull cap.)

I knew what was happening: once again, my wife had been flagged for the ‘no-fly’ list. The first time this had happened had been during our honeymoon to Spain some eleven years ago; the last time my wife had been flagged was on our return from Amsterdam four years ago. (That’s right; my wife had been allowed to fly to the US from Europe, but her entry into the US was blocked.) On each occasion, she had been questioned–in interrogatory fashion–by a brusque official, and then ‘let go.’ There was no consistency to the checks; sometimes they happened, sometimes they did not. For instance, my wife was not blocked from traveling to–or returning from–India in 2013. At the least, the security system being employed by the Department of Homeland Security was maddeningly inconsistent.

But matters did not end there. It was not clear why my wife had been placed on the ‘no-fly’ list in the first place. Was there something in her background data that matched those of a known ‘terrorist’? This seemed unlikely: she had been born in Michigan, grown up in Ohio, attended Ohio State University, gone to graduate school at the City University of New York, and then law school at Brooklyn Law School before beginning work with the National Labor Relations Board as a staff attorney. (During her college days, she had worked with a student’s group dedicated to justice in Palestine, but that seemed like slim pickings. On that basis, you could indict most Jewish students who attend four-year liberal arts colleges in the US.) But she is Muslim–or, as my wife likes to say, ‘she was born into a Muslim family’–and still retains her Muslim last name after marriage. That could certainly be a problem.

After the first instance of our being detained at an airport, we had expected no more detentions; after all, the US’ security officers would have noticed that a particular passport number, belonging to a particular American citizen, had been incorrectly flagged at a border check; they had ascertained to their satisfaction that all was well; surely, they would now remove that name and number combination from their lists and concentrate on their remaining ‘targets.’ The first check would have acted as a data refinement procedure for the learning data used by their profiling software; it would now work with a cleaner set and generate fewer ‘false positives’–like my wife. That’s how learning data systems are supposed to work; the ‘cleaner’ the learning data, the better the system works.

But that had not had happened. Over the course of the past eleven years, my wife was detained again and again, leading up to this last instance on last Friday. On each occasion, the same procedure: ‘Follow me please; sir, you stay right here.” (Mercifully, in Vancouver, perhaps noticing we had a child with us, the border officers allowed me to accompany her to their chambers.) And then, the questioning, which sought to establish her credentials: “What’s your father’s name?” What’s your mother’s name” “Where do you work?” and so on. Finally, “Thank you, ma’am. You can go now.” But none of the information gathered in these sessions had any value whatsoever as far as the no-fly profiling system was concerned. That remained magnificently impervious to the empirical particulars of the world outside; as far it was concerned, my wife was still guilty. Sometimes.

When the interrogation of my wife had ended, I asked the border officer: “How do I get my wife off the list?” His reply: “I don’t know.” I then asked: “Do you have any idea why she was flagged today?” His reply: “She has a pretty common last name.” I stared at him, dumbfounded. When Sinn Féin was rated a quasi-terrorist organization, did the US flag every Irishman at JFK who bore the last name Adams? Could it really be possible that this profiling system was as stupid as this officer was making it out to be? But that hypothesis was not so implausible; there was nothing in my wife’s background that would indicate any reason to place her in the same class as those folks who might be potential 9/11’ers. Moreover, this profiling system remained dumb; it did not ‘learn’; its conditional probabilities stayed the same no matter what its handlers learned about its learning data.

It’s tempting to call this a Kafkaesque situation and let it go at that. (And perhaps throw in a few complaints about the petty harassment this generates; the Muslim family I saw waiting with us missed their flight, and the solitary male was rudely told to move at one point.) But there is more here; this system, this ‘silver bullet’ that is supposed to keep us safe and for which we should be willing to give up our civil liberties is useless. And dangerously so. Its very strengths, to look for patterns and evidence and generate plausible hypotheses about the guilt of its subjects, are compromised by its design. I’ve speculated why my wife’s entry in the no-fly list has not been deleted and the only plausible explanation I can come up with is that whoever makes the deletion takes a very tiny risk of being wrong; there is an infinitesimal probability that the ‘innocent’ person will turn out to be guilty, and scapegoats will then be found. Perhaps that fear of being indicted as the ones who the let the Trojan Horse through stays their hand.

Whatever the rationale, the end-result is the same: a useless, dangerous, and offensive security system that on a daily basis–I’m quite sure–subjects both citizens and non-citizens of the US to expensive and humiliating delays and interrogations. And makes us safer not at all.

If it’s the first–or sometimes, the second–weekend in July, it’s time for Wimbledon brunch–or breakfast. Today, I hosted a few friends to partake of the pleasures of the 2012 finals. Among them, Roger Federer’s biggest fan, one whose fanhood makes for very interesting watching from up close. I have watched many tennis matches with her in the past five years, and am always struck by her involvement, her anxious following of her favorite, an anxiety compounded and made worse by a tennis match’s fluctuations and the ebbs and flows of its dramatic resolution. It’s been a long time since any sports encounter has done that to me but I remain susceptible under the right sorts of circumstances and thus, sympathetic to her trials and travails. (During the epic Federer-Nadal 2008 final, as it moved into a fifth set and into another cluster of deuces, she had simply stopped watching the television and started doing the dishes instead: the tension had grown to be too much for her. I knew from past experience exactly what she was feeling: a tightening of the gut, a nausea whose phenomenology is distinctive.)

Today, as Andy Murray won the first set, and Roger Federer began his comeback in the second set, I was introduced to a newer palliative for her anxiety. The mundane, domestic, hands-on relief of dishwashing was exchanged for tracking, er, the IBM Data Tracker, which, well let me just let IBM’s marketing folks do the talking from here on:

IBM has mined more than seven years of Grand Slam Tennis data (approximately 39 million data points) to determine patterns and styles for players when they win. This insight is applied to determine the “keys” to the match for each player in a match.

Prior to each match, the system runs an analysis of both competitors’ historical head-to-head match ups as well as stats against comparable player styles, to determine what the data indicates each player must do to do well in the match (SPSS technology)

The system then selects the 3 most significant keys for each player in the match

The Keys to the Match dashboard updates in real-time with current game statistics as the match unfolds

So, at any given moment, the Tracker displays how well the player in question is doing in terms of the ‘three most significant keys:’ conformance with the required value of the key indicates the player is headed for a win (roughly). Thus, then, the reassuring power of the IBM Data Tracker for the bundle-of-nerves fan, wondering whether the 0bject of her attention, her vicarious desires, is performing as he should in order to win. The Data Tracker dips beneath the contingent unpredictable flux, to reach into the hidden order of things and reveal a glorious stability, a movement along a data line that indicates progress, and hopefully, inevitable movement towards the desired endpoint. The analytic grants us the security, that despite all the seeming variance of the surface, the chaos of the visible, there lurks the reassuring solidity of the conforming data point.

The ancient motivation for the statistic, made so starkly manifest in providing therapeutic relief to the sports fan.