Big Data = Big Government

Big data has become all the rage. The Internet and other technologies of data collection and surveillance have produced massive amounts of information. Big business and the scientific community now want to put that information in the hands of government to use to solve all sorts of policy problems. As the TechAmerica Foundation highlights in their new report, Demystifying Big Data, computational technologies now offer governments the capability to understand and manage social and economic processes on scales unimaginable even a few years ago.

I’ll be clear up front: I am not a fan of big data. Much of the criticism of big data is lodged in the problem of privacy. My concerns are different, however. The real problem is the scope of government power. We often talk about big government in terms of the size of government budgets and personnel, but those are just proxies for a more important problem, the exercise of its influence in our lives. Big data will lead government to have far greater influence in our lives, arguably in far more insidious and less visible ways. It is, ultimately, a tool for social control and engineering, which is what makes it appealing to both technocrats in the center left and the law and order crowd in the center right. Community organizers and libertarians, alike, watch out. The state is about to become enormous.

Privacy protection can’t fix this problem. Privacy is about what the government—or a company—knows about you as an individual. If someone were able to access that data and use it for nefarious purposes, like denying insurance or rejecting a job application, they could potentially do significant damage to you and to your family. That’s bad. And it’s why TechAmerica’s report highlights that big data can be fashioned in ways that protect against privacy intrusions. Big data, they suggest, is really about aggregate data, anyways. Nothing should be able to link identifying data to you. If identifying data must remain, access should severely restricted to that information to prevent abuse.

But privacy protection falls short of addressing several major problems with big data. For example, there are some especially crucial applications of big data that not only require identifying information but that require its constant and pervasive use. Two illustrations. Political parties in the United States have amassed massive databases about individuals designed to allow them to target individual voters with tailored messaging in order to mobilize votes for their candidates and suppress votes for other candidates. Likewise, retail stores are amassing massive databases about individuals in order to be able to tailor marketing and advertising efforts to your individual preferences and tastes. In both cases, identifying data is absolutely essential in order both to construct the profile in the first place (the database must link a large set of purchasing and other decisions to one another and to you) and to make the link between the data profile and you as a target of individually tailored political or economic persuasion. In these cases, protecting the privacy of data (i.e., restricting access to data) is not the primary concern; rather, protecting against inappropriate political and economic manipulation is.

The second and subtler problem is that de-identified, aggregated data is in reality worse from the standpoint of government intrusion into our lives. Here’s why. Data about you as an individual can be used to manipulate, threaten, or sanction you, as an individual. But aggregate data about large groups can be used to manipulate entire populations. The US Constitution, which protects individuals from the arbitrary and capricious exercise of state power, says nothing about social manipulation on grand scales.

Economists and behavioral psychologists, for example, including senior officials in the Obama Administration, have become enamored by the theory of nudge as an instrument of public policy. Nudge hypothesizes that if you structure a decision choice in a way that takes advantage of people’s emotional or psychological inclinations, you can still give them a free choice while generating better public policy outcomes (read that as societies that comply better with how the government believes they should behave). But let’s look at one of their most common examples. If you structure employee choices to contribute to a 401k so that the default is to contribute, rather than to not contribute, many more people will actually contribute to their voluntary retirement accounts. So, technically, those people all had a choice. They could have opted out. But if the result of government rules is that 50% more people contribute to 401k plans, then the government has, as a matter of fact and very real societal outcome, manipulated the US public into behaving the way the government decided was best.

This is precisely the point of most big data applications for the business of government. Big data wants, for example, to measure health care outcomes comparatively in order to make probabilistic models of which treatments would best work for you, as an individual, and then to usurp the judgment of you and your doctor in favor of their model by refusing to pay for treatment unless the models approve. Similarly, big data wants to shape where and how people drive, how children are educated, how poor mothers take care of themselves during pregnancy, etc. The applications, as TechAmerica suggests, are pervasive across all levels and domains of government concern.

The fact that all of this is designed to structure society more deeply and thoroughly in ways determined by government experts isn’t discussed at all in the TechAmerica report. How could it be? It’s the point. In the spirit of Taylorism and Fordism, big data threatens to use the promise of efficiency and optimization to turn us all into gears in the giant machines that are today’s complex technological societies. It’s not hard to see why. 21st Century societies face significant problems that will require enormous creativity and collective innovation to solve. Big data offers the illusion of rational solutions to these problems through socio-technological optimization, but it is just an illustion. To paraphrase James Scott’s Seeing Like a State, the hubris of those who have believed they can use big data to know and govern society has given rise to some of humanity’s worst technological failures.

The problems with hyper-rationalist social engineering are legion. Numbers can capture patterns of behavior, but they cannot capture the realities of meaning, identity, value, spirit, drive, ambition, love, and hatred that underpin and shape behavior. Communities are collectives, not aggregates of individuals. Local vibrancy overwhelms simplified models. Data—even big data—is inevitably biased toward metrics that are easily and inexpensively measured, or toward metrics that someone has decided to pay to collect. Data is value-laden. But the biggest problem of all is that big data is an instrument of big organizations, whether big government or big companies. It is an instrument of power that threatens not only the freedom of individuals but also the freedom of societies.

Thanks, Cameron. Good ideas, all. Especially the first. Worth thinking about. The basic tendency of big data is to focus on efficiency and management, neither of which encourages freedom. But could one create data infrastructures that privileged the poor and marginalized over the rich and powerful? That turned the lenses of the panopticon back on the state and the powerful corporation? Sure. The challenge is to ensure that they cannot be used even more effectively by the powerful.

I’m a student in a data mining group at UCI. I disagree with you on almost all of your points here.

If privacy solutions don’t apply to big data, what about the two situations of inadequate privacy controls you describe here? Political campaigns and private companies amassing databases to target specific people is an obvious privacy problem.

About the other uses of big data to “manipulate” a populace: What is inherently wrong with trying to change public opinion? How do you draw the line between advertisements (which have been using statistical data for decades) and any conversation between two people? How do you decide what is appropriate “influence” and what is duplicitous “manipulation”? I challenge you to find a way that is not dependent on a violation of privacy.

You tread conspiracy territory with your “fact” that “all of this is designed to structure society more deeply and thoroughly in ways determined by government”. In a sense, this is trivially true. Any government design is an attempt to structure society in a way determined by government. Assuming you mean something extraordinary by it, you should consider Hanlon’s Razor. Caring about logistics and management is not a conspiracy to destroy culture. “The hubris of those who have believed they can use big data to know and govern society has given rise to some of humanity’s worst technological failures.” This sounds amazingly relevant. Please give a detail.

A “promise of efficiency and optimization” frequently pays dividends. See “logistics” in general, or the Green Revolution if you want a concrete example. Big Data is in the spirit of Taylorism and Fordism the same way immunization is in the spirit of bloodletting. Breast cancer screening is not recommended after two additional years because a government bureaucrat wants to spend more on men. It’s because a knowledge of false-positive rates and their costs frees those resources for use on those who need it.

If “numbers cannot capture the realities… that underpin and shape behavior”, what about the established statistical and scientific studies that accurately reflect behavior? You yourself described some of the most common psychological advertising gimmicks earlier in this article. That aside, math is respected for a reason. Efficiency frequently means the difference between wasted resources and saved lives. These techniques apply to healthcare resources as well as they do to agriculture. Maybe you meant “Big Data Numbers” don’t capture valid information the way other scientific or statistical data has been collected. This still contradicts all of successful data mining, some of which you describe in your article.

The fact that data samples contain bias is true. It is also a subject of extreme importance to those that use the data. There is a reason people don’t arbitrarily interpret everything they see. The discipline of making rational decisions based on biased and limited information is what statistics is all about. Here I think you are getting at a very important point: Who in power is accountable for the decisions they make? How are important choices justified. Who is validating or verifying this process? Who has access to the data? These are not problems specific to Big Data, nor will they go away in its absence. You are conflating serious issues with the appearance of a new domain in which they apply.

Finally, Big Data is not merely the instrument of corporations and big government entities. You have another definition problem: “Big Data” refers to data mining used in current government proposals. To complain “the biggest problem of all is that big data is an instrument of big organizations” is to complain that big organizations exist. Eliminating Big Data will not save you from this unfortunate fact. I suspect you mean that data mining and math are solely the instruments of big organizations, and then you are obviously wrong. “Big Data” outside large organizations is also the tool of scientists, non-profits, and anyone with an internet connection. If you are afraid of powerful data-hoarding cabals, the solution is to create open access to that data, not to hide it and pretend it doesn’t exist. If you are concerned over abuses of privacy, the solution is to enforce privacy controls, not outlaw the possession of information.

Many thanks for your long and thoughtful comment. I appreciate the feedback.

I suspect that you and I are not actually that far apart. Let me make a couple of observations regarding your comment.

First, I didn’t deny that privacy issues are important with big data, of course they are. Privacy protections simply can’t solve all of the problems that big data gives rise to. You acknowledge this at the end where you highlight the need for open access to big data. That’s a different normative issue that requires different solutions. And, I would observe, the TechAmerica Foundation’s report is essentially silent on open access issues.

More importantly, my point about privacy in the original post is that some big data efforts simply require privacy violations. No amount of privacy protection can protect people from the harmful application of big data if the big data application requires privacy violations, which I believe a number of big data applications will. The only options are to ban this kind of big data application.

Your observation about advertising is, of course, exactly correct. It’s the same thing. Now, I’m not a huge fan of advertising, in the first place, especially of the deceptive kind. But I would note, importantly, that society differentiates (in my view, quite correctly) between market manipulation via advertising (which is legally allowed and tolerated but subject to tight scrutinization for fraud), political candidate manipulation via advertising (which is legally allowed and tolerated under free speech rules but many believe should be more tightly constrained than it is to prevent those who can dominate the airwaves with their money from overwhelming the voices of those who are too poor to do so), and government manipulation via advertising (which is very narrowly constrained to some minimal public service announcements). The reason we differentiate is out of a view that the government should have limited powers, a view I agree with. Put simply, we’d rather not live in a society in which the government added an American version of the old Soviet Pravda to its already significant powers.

Does my criticism of big data apply equally to all government programs, namely that each must effectively work to structure society in the image desired by big government? Possibly. But of course that doesn’t underscore the criticism; it simply expands it to other programs besides big data. This is another requirement, besides privacy protection and open access, that one might want to add to big data rules: there should be opportunities for reflective analysis, critique, and deliberation of the assumptions underlying big data programs. Cool. I’d be delighted with this.

But I also think big data raises two important issues that are not necessarily raised by (all) other government programs (although they certainly are raised by some other programs). First, by masquerading as science, there is a tendency to restrict deliberation of the underlying assumptions of big data programs to experts, rather than ensuring that such assumptions about how they will structure society are open for broad public deliberation and debate. That is, there is a tendency to talk about big data as if it is merely an instrument to improving efficiency (your own comment has some of this language), rather than as an inherently socio-political enterprise. The latter does not deny the importance of efficiency, but it implies that we must supplement any discussion of improving efficiency with a deliberation of the kinds of social structuring implied by the program.

Second, I believe there is a very real normative issue at stake in the *deliberate* structuring of society, especially where the government is engaged in efforts to keep that structuring activity hidden. In my view, the problem with nudge and its big data extensions is not so much that government activity is political–all government activity is political, as you rightly point out–but that nudge is designed to (a) deliberately structure society; (b) according to the standards of experts (as opposed to citizens or their elected representatives); and (c) to hide its efforts to do so.

Finally, let me say that I think it is naive to believe that open access rules would solve the problem of big data for three reasons. First, big organizations (government and large corporations) would still be the ones who fund the creation of big data sets. They would, therefore, control both the domain of such data sets as well as the specific assumptions that guide which data is collected, about what, and for whom. Open access to such data sets would allow them to be used by others, but would never rectify the problem that it takes enormous resources to construct a big data set and so such data sets would tend to ignore things that those with money don’t want to know about (either deliberately, because they’d harm their interests, or inadvertently, because they don’t care about them). Likewise, open access would mean that other people could critique the assumptions built into specific data sets but couldn’t necessarily do anything to alter those assumptions. In short, what we know and don’t know would still be determined largely by those who could afford to pay for the construction of big data sets.

Second, there’s little evidence to suggest that big organizations would agree to open access rules, especially for the most valuable data sets. Outsiders could press them to make them open, but, in fact, privacy considerations and privileged or proprietary information often make convenient excuses to fail to open up data to public access. Indeed, to the extent that companies look to data sets as a financial resource off which they can make money, they have little to no incentive to open up access to those data sets. The TechAmerica report, for example, was largely written by companies that have enormous financial stakes (SAP, IBM) in providing privileged access to information, just as do the big financial firms (Visa and MasterCard, credit rating agencies, and others) that sell information to advertisers and political parties for their voter initiatives. These datasets will never be open access, yet they are in many ways the most important.

Third, open access doesn’t guarantee equal access to the (human, computational, and expertise) resources necessary to exploit these data sets. Big businesses and big governments can hire thousands of researchers, build enormous computing centers, fashion massive modeling efforts, and hire the very best experts in the world to conduct their analyses and applications of big data. Theoretically, it seems possible that a network of decentralized individuals and small organizations dedicated to crowdsourcing the analysis of open access data sets could provide a powerful counterweight to the power of big organizations, but it’s just a theory. Look at how long the tobacco industry was able to successfully fight regulation of their products, and the success of the coal and oil industries in opposing climate change policy.

In the end, for me, it comes down to this. The ability to define and construct the knowledges that underpin business and policy choices is the central tool of power in modern societies. That ability is already skewed in favor of large business and government organizations (although, as witnessed in any number of recent cases, from Fukushima and Deepwater Horizon to the US financial collapse in 2007-9, their command of this knowledge is, at best, sketchy, and offers, in my view, the evidence that you are looking for in terms of the failure of big data to offer sound policy guidance). Big data seems likely only to skew power further into the hands of the few, instead of the many.

Thank you for your reply. We agree on at least a few things, so I’m going to try to focus on where we disagree. You brought up issues that are for the most part separate, but I believe encompass all of your concerns with big data: privacy, data access, and statistical validity. What I will try to make clear is that the issues of the role of government and the structure of society are entirely separate from these.

About privacy: If you want to claim privacy protections can’t solve all big data problems, why did you immediately list problems that are solved that way? I’m not saying privacy protections solve everything. I’m saying “stop using them as examples of big data problems” because they are by definition solved by privacy protections. Your earlier examples show violations in the collection and maintenance of data. Personal details and of course analysis based on those details should be off-limits. These applications are in no way inevitable or required by other projects. You hint at privacy issues that are unavoidable, so you must mean something for which you haven’t given an example.

You say big data is “an inherently socio-political enterprise”, and express concern over how it is used as evidence. I see this as the most important issue: how to circumvent the socio-political tendencies to manipulate positions of power. Characterising big data or statistical analysis as a tool is the solution here, not the problem. When big data is characterized as a black box that provides answers, it becomes a tool for obfuscation and abuse. When big data is a tool for policy, someone is always responsible for how that tool is used. Analysis of big data projects provides no more support for policy manipulation than anything else. If a government leader is unchallenged in their abuse of big data, why would you expect them to be challenged otherwise?

In the case of big data, this leads back to open access. The only way to be accountable is to make the data available. You are wrong to say no big data proposals promise open access. The TechAmerica report pays token service to education and making data available to the public. As the report is essentially a sales pitch point for investment in data research and analysis, not a legislative document, this is far from silence. The proposals collectively referred to as big data are already subject to open access requirements, they don’t need a think-tank to enforce it. There is already opportunity for analysis and critique of big data programs.

Why does the future of big data require private corporations to own all of it? That there will always be private data held by large organizations is entirely different from same corporations privately owning all data. You conveniently feel that private financial firm data is the most important data and will never be released. This is difficult to take seriously. Please elaborate or reconsider. In the TechAmerica report, companies with financial incentives support open development. This just doesn’t translate into same companies locking out open data projects. It’s amazing you were able to make that connection.

The previous issues apply equally to the implementation of any information-based decision policy. The fact that they exist outside big data does not invalidate them, it means there is a history of solutions and work to provide solutions for them. Your criticism of big data as a government initiative to influence people is more than underscored, it is invalid. Big data does far more than guide propaganda, and propaganda efforts won’t hurt much from its absence.

Your only complaints specific to big data are here: the expense of large data prohibits open access to existing work and particular avenues of new research. Based on this, you forecast that only powerful corporations and government will benefit. For one thing, data management becomes more and more cheap with the development of technology like cloud services. For another, even if big data were to remain expensive, there is a long history of expensive research done for the public benefit. It is true that open access is not equal access. It is more-equal access.

To invoke Taylorism and Fordism based only on notions of efficiency is a denial of the importance of efficiency. Your give weak examples. Black-box abuse (failure of the federal reserve), non sequiturs (what statistics-based government policy is responsible for Deepwater Horizon?) or outliers (Fukushima standards fail against a gigantic tsunami) are none of them failures of sound information-based policy. If you are going to disregard the discipline of evaluating likelihood, you shouldn’t base your conclusions around what “seems likely” to you. Uninformed policy is when someone like you makes guesses about what seems likely. Informed policy, in the worst case, adds a slight bit of rigor to that process. This article isn’t about big data. It’s a parade of excuses for your favorite refrain about big government as the root of evil. The strongest criticism it offers of big data is for nothing more than being a government program. I find it fascinating that while you decry the dangers of big government and big corporations, your immediate solution is to eliminate the government. Until you find a way to abolish all large organizations, you have little justification to remove the single most transparent and publicly accountable one.

Let us not descend into name calling. I am neither uninformed nor simply a parrot of anti-big-government propaganda. Nor do I have any desire to eliminate the government. Far from it. I do worry about big government and big private organizations, however. No one who lived either during the existence of the Soviet Union or through the last four years can be anything except somewhat skeptical of the proposition that big government and big corporations have society’s better interests at heart.

What I do believe is this: the argument that big science and big technology (and big data is a case of both) can be used innocently in the pursuit of public good is naive and deeply flawed. Both are powerful tools for the structuring of social order. They are thus instruments of power, in their own rights, and not simply neutral tools to be used by those who already hold power.

Thus, in concerning ourselves with their relationship to the use and abuse of power, and thus with the problem of accountability, we need to be concerned with several elements:

1. Should we create such instruments of power in the first place? This is the question of the atomic bomb and, it is, equally important to ask it of any big science or technology. The mere existence of potential benefit does not justify the creation of powerful instruments. We must be tolerably sure not only that the benefits outweigh the risks but also that the costs will not be too high and that the benefits will not flow to some while the costs are borne by others. We must also ask whether the world that people will create around and with this new instrument is one we really want to live in. That is, we should want to be sure that neither the unanticipated consequences of such projects nor the the consequences of failure, should the use of big data turn out to cause policy failures rather than successes, are untolerable. We should ask this both about big data and about each individual big data project that is proposed.

2. If we do create such instruments of power, how will we ensure that their power is not abused? This is the question of accountability, and, unfortunately, in democracy, accountability is always retrospective. You can make it illegal for the President to spy on his political opponents, but that can never prevent someone like Nixon from exercising that power. Nor can will you necessarily get to throw them in jail afterwards, I might note. That’s partly why I’m not convinced one should allow such instruments to be created in the first place.

Privacy protections and open access are important potential governance strategies in this effort. All I am trying to say is that, as governing strategies, they may well fall short, and we should plan for that. Their most important weakness is that they may simply not be applied. Google’s backtracking on their privacy protections foreshadows the difficulties of enforcing such policies. So, too, does the general failure of HIPAA to accomplish strong privacy protections for medical data. I see no real reason, given the current state of affairs in the United States, to presuppose that we will ever enforce strong privacy protections on all big data projects. I did not raise the issue of personalized advertising and personalized voter targeting databases on accident. These are big data projects that are already in existence and that seriously violate common sense notions of privacy. I presume your argument is that we should eliminate such projects precisely because they require privacy violations in order to have value. Great. I agree. But until the government gets serious enough about privacy protections to force those using these databases to destroy them, I am skeptical that we should encourage the development of future big data projects.

As for open access, I certainly agree that open access big data projects would be better than closed big data projects. That said, I think it’s fair to ask two questions. The first is: can we guarantee open access for all big data projects? Again, my proposition is that this is unlikely and, further, that we should be worried about advocating for big data unless we have strong reasons to believe that we can secure open access. I am apparently less sanguine than you in this regard. I cannot help but think that the incentives for powerful organizations to retain closed control over really useful data will be too high, in the end.

The second question is: even if big data projects are open access, will the projects overall impact be to help tilt the balance of power away from big organizations and back toward individuals? Thus, the question is not, as you suggest, whether open access projects are more equal than closed access projects. Of course they are. It is whether a world full of open access big data projects is more equal than a world without those same open access big data projects. And, there, I think, the answer is no.

So, in the end, we must balance your concerns about the ability of big science and technology to improve efficiency against two concerns of mine. The first is that big data poses, in my view, serious threats to democracy, against which, so far, we have only relatively weak instruments of governance in privacy protections and open access rules.

The second is the question of whether big science and technology will actually work to improve the public good. Claims to their effectiveness may be overblown. As my colleague Dan Sarewitz has pointed out, scientific uncertainty is almost inevitably more significant and less well characterized than proponents of big science and technology prefer to acknowledge. The successes of big science and technology may be counterbalanced by failures (this is what the examples of Deepwater Horizon and Fukushima are about). Or, the unanticipated effects of big science and technology may ultimately create worlds that we are not excited about inhabiting or that distribute their positive and negative outcomes in ways that are highly unjust.

I’m not aware of calling you names, but maybe that’s just me being “naive”. You may not be a parrot of anti-government propaganda, but your arguments are. Help me straighten this out:
“the biggest problem of all is that big data is an instrument of big organizations”
– What’s wrong with that?
“It must effectively work to structure society in the image desired by big government. This is a reason to oppose it.” (paraphrased)
– Doesn’t that apply to all government initiatives?
“Yes.”

The problem is in implying that government should have no control over our culture. If employees contribute to 401k plans by default, they will tend to act one way. If they do not contribute by default, they will act another. In either case, a decision must be made. You consistently suggest it is better to trust these decisions to the roll of a die than risk an informed system of people. Little or no government influence over culture is not possible. Attempting it effectively concede all such influence to the big corporations you share my distrust of.

Questioning the creation of an instrument of power is entirely academic when that instrument already exists in the hands of adversaries. In the case of big data, that instrument has been around for almost a century. Eliminating big data and social engineering efforts is as futile as eliminating nuclear proliferation today. I would argue it is more difficult, because you aren’t talking about un-making a chemical reaction. You’re either asking the entire world to un-learn something, or you are merely asking our government to stop while everyone else continues. Let me re-iterate: the question of Big Data is not the question of the atomic bomb. Big Data is public programs that hold the greatest promise of benefit for the common people. If your main concern is over the proportion of who receives these benefits, stopping the primary avenue for public works is counter-productive.

It seems government itself is a tool you aren’t ready to wield. There would have been some weight to this tactic before globalization. Now global powers exist over which you have no control. Corporations are engines of influence, yet you find it better to dismantle the one organization you have the most influence over.

Can we guarantee open access for all big data projects? If you’re asking if this is legally and technically feasible, then yes. If you’re asking if the socio-political process of our government will make all data available, I think you can answer that better than I can. This reduces your first question to the general question of government transparency.

Accessibility is valuable in allowing the powerless greater use of this technology, but its real value is in accountability. All accountability relies on open access to information. Furthermore, we cannot hope to hold outside organizations accountable without our government. This isn’t a question of punishing transgressions after the fact. Transgressions occur right now. As Chris Chambers says, “We need to put aside our (natural) moralistic inclinations and treat science like a biological system in which misconduct is a naturally occurring disease that requires treatment and prevention.” His advice applies as well to political as to scientific misconduct.

“Is a world full of open access big data projects more equal than a world without those same open access big data projects?” This most leads me to believe your intuition over proportions is failing you. I asked you for examples of failures of informed data-driven policy. You gave “failures of big science and technology”, which were failures of neither. Something also lead you to believe a number of big data applications will require privacy violations, but it was none of these examples. Now you question the net effects of “big science” in general. Do you really believe the fallout of Fukushima balances out the benefits the Japanese people received in aid and technical support during their crisis? How can you possibly equate them?

Further, it is fallacious to blame Fukushima, Deepwater Horizon, or economic decisions on science because there will still be earthquakes, tsunamis, corporations breaking regulations, and economic recessions without science-based policy. Your intuition about the net positive/negative effects of science are equally questionable. It is just as fallacious to blame the effects of big science and technology if you aren’t “excited about inhabiting” the world developing around you.

Let me ask a question: if I said I happily voted for Obama, I support the Affordable Care Act, but wish it had been a single payer model from the outset, and that I hold a PhD in electrical engineering, would you believe me?

Your comments continue to assume that I am either a libertarian or confused about the implications of my views. I’m neither. Go back and read my posts again, assuming from the outset that I’m a big government liberal. Yet, I still believe the things I’m saying about big data. You may have critics on the right, but you also have a critic on the left.

Believing in the value of government does not, in any way, preclude engaging in critical analysis of the tools that government chooses to use. Yes, of course, the state must balance the powers of big corporations. But we must also limit the powers of both kinds of organizations, either by holding them accountable or by limiting the tools we allow them to use.

I agree with you that privacy controls and open access are important in promoting that accountability. I just don’t think they will either (a) realistically be applied to all applications of big data; or (b) suffice to prevent the abuse of power via big data (by either governments or organizations). Given those conclusions, I think we should be skeptical about the further widespread upgrading of the big data capabilities of both government and companies.

Why? The use of big data by both government and corporations further imbalances the power between these institutions and individuals, families, and communities. Just as you think I underestimate the public good to be derived from big science and technology, so I think you overestimate this value.

Regardless of how long science, technology, and big data have been around, we live in a world in which the socio-technological systems that impact our lives are growing in scale, complexity, and, frankly, fragility. Continuing a headlong rush in this direction on the theory that more big science, more big technology, and more big data will only make things better is, I think, dangerous.

P.S. I can’t figure out how you think it’s possible to do either targeted advertising or voter targeting without storing information about individuals’ behaviors as well as their identity, in a disaggregated form, and then making that information available to users of the data who want to target me. This strikes me as the essence of a violation of privacy. They collect personal data about me, link it to my identity, and then give it to somebody else.

I don’t make assumptions about your political affiliations. They are irrelevant to me. I responded to what you stated here about your faith in statistics, statistical inference, and whatever “big science” is. No political group has a monopoly on science criticism. Far more than one liberal is a critic of good science, but this isn’t a bad thing. The question is how those criticisms hold independently of the pedigree of the one voicing them.

I would encourage your engagement in critical analysis of government and its tools, but that criticism must be based on something more than “It enables government to influence the structure of society.” This is not necessarily a bad thing, so fails as its own criticism. As criticism, it also happens to undermine any foundation on which a government might be based.

I agree we should limit the powers of government and big corporations. Maybe I am overly optimistic about the benefits of Big Data programs, but I have more reason to expect help than harm. Restricting Big Data as an attempt at restricting the government does little more than make the government worse at the things it is already doing. You’re not cutting government powers, or stopping propaganda. You’re making all existing government systems dumber. Whatever “Pravda” efforts exist will still be there. You are cutting off the nose to spite the face.

I think the important distinction that needs to be made here is “what is Big Data?” Big Data is not all forms of statistical modeling and inference, it is these methods put to use to inform government policy. If more women’s lives are saved by placing an emphasis on cardiovascular health than breast cancer, I have an easier time making good decisions in the middle of very emotional issues. This is an example of statistical data that already informs policy. Big Data is an acknowledgement of the value of this kind of information, and an attempt to look for more uses for it. Consequently, the CDC’s information is very accessible to individuals and does a lot to inform individuals, families, and communities.

When you argue that Big Data will further imbalance power, you are saying government policy as it exists today already benefits individuals, families, and communities more than it benefits big government and corporations. If you think that shining a light on the foundation of financial and health policies will do no service to the powerless, then we have identified the fundamental disagreement between you and I. Data and information derived from it is that light. We’ve gotten too used to politicians that base policy on their philosophy or core beliefs. Big Data is the method for challenging these policies and making someone justify their actions beyond what they merely “thought was right.”

Targeted advertising and voter targeting does require the use of personal information. This is a privacy issue, and a big one. When you mentioned earlier privacy violations, this is what I had in mind. Have targeted campaigns abused privacy rights? Absolutely. Are all targeted campaigns abusive? Not in the least. Many people give their data away to Facebook or Amazon for the convenience it provides them. This is a far cry from shady political campaigns that buy or collect and use this information by nefarious means. If any “Big Data” programs had anything to do with targeted campaigns of any sort, we would have great reason to oppose them. Do they? Please point one out to me.