Tag Archives: big data

Consider the various image-sharing databases online: Facebook’s photo stores, Instagram, Flickr. These contain trillions of photographs, petabytes of fragile digital data, growing daily, without limit; every day, millions of users worldwide upload the images they capture on their phones and cameras to the cloud, there to be stored, processed, enhanced, shared, tagged, commented on. And to be used as learning data for facial recognition software–the stuff that identifies your ‘friends’ in your photos in case you want to tag them.

This gigantic corpus of data is a mere court-issued order away from being used by the nation’s law enforcement agencies to train their own facial surveillance software–to be used, for instance, in public space cameras, port-of-entry checks, correctional facilities, prisons etc. (FISA courts can be relied upon to issue warrants in response to any law enforcement agency requests; and internet service providers and media companies respond with great alacrity to government subpoenas.) Openly used and deployed, that is. With probability one, the NSA, FBI, and CIA have already ‘scraped’, using a variety of methods, these image data stores, and used them in the manner indicated. We have actively participated and collaborated, and continue to do so, in the construction of the world’s largest and most sophisticated image surveillance system. We supply the data by which we may be identified; those who want to track our movements and locations use this data to ‘train’ their artificial agents to surveil us, to report on us if we misbehave, trespass, or don’t conform to whichever spatial or physical or legal or ‘normative’ constraint happens to direct us at any given instant. The ‘eye’ watches; it relies for its accuracy on what we have ‘told’ it, through our images and photographs.

Now imagine a hacktivist programmer who writes a Trojan horse that infiltrates such photo stores and destroys all their data–permanently, for backups are also taken out. This is a ‘feat’ that is certainly technically possible; encryption will not prevent a drive from being formatted; and security measures of all kinds can be breached. Such an act of ‘hacktivism’ would be destructive; it would cause the loss of much ‘precious data’: memories and recollections of lives and the people who live them, all gone, irreplaceable. Such an act of destruction would be justified, presumably, on the grounds that to do so would be to cripple a pernicious system of surveillance and control. Remember that your photos don’t train image recognition systems to recognize just you; they also train it to not recognize someone else as you; our collaboration does not just hurt us, it hurts others; we are complicit in the surveillance and control of others.

I paint this admittedly unlikely scenario to point attention to a few interesting features of our data collection and analysis landscape: a) we participate, by conscious action and political apathy, in the construction and maintenance of our own policing; b) we are asymmetrically exposed because our surveillers enjoy maximal secrecy while we can draw on none; c) collective, organized resistance is so difficult to generate that the most effective political action might be a quasi-nihilist act of loner ‘civil disobedience’–if you do not cease and desist from ‘collaborating,’ the only choice left to others still concerned about their freedom from surveillance might to be nonconsensually interrupt such collaboration.

Everyone is concerned about ‘algorithms.’ Especially legal academics; law review articles, conferences, symposia all bear testimony to this claim. Algorithms and transparency; the tyranny of algorithms; how algorithms can deprive you of your rights; and so on. Algorithmic decision making is problematic; so is algorithmic credit scoring; or algorithmic stock trading. You get the picture; something new and dangerous called the ‘algorithm’ has entered the world, and it is causing havoc. Legal academics are on the case (and they might even occasionally invite philosophers and computer scientists to pitch in with this relief effort.)

There is a problem with this picture. ‘Algorithms’ is the wrong word to describe the object of legal academics’ concern. An algorithm is “an unambiguous specification of how to solve a class of problems” or a step-by-step procedure which terminates with a solution to a given problem. These problems can be of many kinds: mathematical or logical ones are not the only ones, for a cake-baking recipe is also an algorithm, as are instructions for crossing a street. Algorithms can be deterministic or non-deterministic; they can be exact or approximate; and so on. But, and this is their especial feature, algorithms are abstract specifications; they lack concrete implementations.

Computer programs are one kind of implementation of algorithms; but not the only one. The algorithm for long division can be implemented by pencil and paper; it can also be automated on a hand-held calculator; and of course, you can write a program in C or Python or any other language of your choice and then run the program on a hardware platform of your choice. The algorithm to implement the TCP protocol can be programmed to run over an Ethernet network; in principle, it could also be implemented by carrier pigeon. Different implementation, different ‘program,’ different material substrate. For the same algorithm: there are good implementations and bad implementations (the algorithm might give you the right answer for any particular input but its flawed implementation incorporates some errors and does not); some implementations are incomplete; some are more efficient and effective than others. Human beings can implement algorithms; so can well-trained animals. Which brings us to computers and the programs they run.

The reason automation and the computers that deliver it to us are interesting and challenging–conceptually and materially–is because they implement algorithms in interestingly different ways via programs on machines. They are faster; much faster. The code that runs on computers can be obscured–because human-readable text programs are transformed into machine-readable binary code before execution–thus making study, analysis, and critique of the algorithm in question well nigh impossible. Especially when protected by a legal regime as proprietary information. They are relatively permanent; they can be easily copied. This kind of implementation of an algorithm is shared and distributed; its digital outputs can be stored indefinitely. These affordances are not present in other non-automated implementations of algorithms.

The use of ‘algorithm’ in the context of the debate over the legal regulation of automation is misleading. It is the ‘automation’ and ‘computerized implementation’ of an algorithm for credit scoring that is problematic; it is so because of specific features of its implementation. The credit scoring algorithm is, of course, proprietary; moreover, its programmed implementation is proprietary too, a trade secret. The credit scoring algorithm might be a complex mathematical algorithm readable by a few humans; its machine code is only readable by a machine. Had the same algorithm been implemented by hand, by human clerks sitting in an open office, carrying out their calculations by pencil and paper, we would not have the same concerns. (This process could also be made opaque but that would be harder to accomplish.) Conversely, a non-algorithmic, non-machinic–like, a human–process would be subject to the same normative constraints.

None of the concerns currently expressed about ‘the rule/tyranny of algorithms’ would be as salient were the algorithms not being automated on computing systems; our concerns about them would be significantly attenuated. It is not the step-by-step solution–the ‘algorithm’–to a credit scoring problem that is the problem; it is its obscurity, its speed, its placement on a platform supposed to be infallible, a jewel of a socially respected ‘high technology.’

Of course, the claim is often made that algorithmic processes are replacing non-algorithmic–‘intuitive, heuristic, human, inexact’–solutions and processes; that is true, but again, the concern over this replacement would not be the same, qualitatively or quantitatively, were these algorithmic processes not being computerized and automated. It is the ‘disappearance’ into the machine of the algorithm that is the genuine issue at hand here.