Abstract

Using the URL or DOI link below will
ensure access to this page indefinitely

Based on your IP address, your paper is being delivered by:

New York, USA

Processing request.

Illinois, USA

Processing request.

Brussels, Belgium

Processing request.

Seoul, Korea

Processing request.

California, USA

Processing request.

If you have any problems downloading this paper,please click on another Download Location above, or view our FAQFile name: SSRN-id2229952. ; Size: 536K

You will receive a perfect bound, 8.5 x 11 inch, black and white printed copy of this PDF document with a glossy color cover. Currently shipping to U.S. addresses only. Your order will ship within 3 business days. For more details, view our FAQ.

Quantity:Total Price = $9.99 plus shipping (U.S. Only)

If you have any problems with this purchase, please contact us for assistance by email: Support@SSRN.com or by phone: 877-SSRNHelp (877 777 6435) in the United States, or +1 585 442 8170 outside of the United States. We are open Monday through Friday between the hours of 8:30AM and 6:00PM, United States Eastern.

Big Data: Pitfalls, Methods and Concepts for an Emergent Field

Princeton University - Center for Information Technology Policy; University of North Carolina (UNC) at Chapel Hill

March 7, 2013

Abstract:

Big Data, large-scale aggregate databases of imprints of online and social media activity, has captured scientific and policy attention. However, this emergent field is challenged by inadequate attention to methodological and conceptual issues.

I review key methodological and conceptual challenges including: 1) Inadequate attention to the implicit and explicit structural biases of the platform(s) most frequently used to generate datasets (the model organism problem). 2) The common practice of selecting on the dependent variable without corresponding attention to the complications of this path. 3) Lack of clarity with regard to sampling, universe and representativeness (the denominator problem). 4) Most big data analyses come from a single platform (hence missing the ecology of information flows).

Conceptual issues reviewed in this paper include: 1) More research is needed to interpret aggregated mediated interactions. Clicks, status updates, links, retweets, etc. are complex social interactions. 2) Network methods imported from other fields need to be carefully reconsidered to evaluate appropriateness for analyzing human social media imprints. 3) Most big datasets contain information only on “node-to-node” interaction. However, “field” effects – events that affect a society or a group in a wholesale fashion either through shared experience or through broadcast media – are an important part of human socio-cultural experience. 4.Human reflexivity – that humans will alter behaviors around metrics – needs to be assumed and built into the analysis. 5) Assuming additivity and counting interactions so that each new interaction is seen as (n 1) without regards to the semantics or context can be misleading. 6) The relationship between network structure and other attributes is complex and multi-faceted.