Breadcrumbs

Computational Social Science

In November 2019, the Foundation’s trustees decided that RSF would no longer accept unsolicited research proposals under the Computational Social Science special initiative. However, RSF remains interested in supporting research that brings these new data and methods to bear on questions of interest in its core programs. Examples of the kinds of CSS-related research questions RSF might support are now included in the RFPs for our programs in Future of Work, Race, Ethnicity and Immigration and Social, Political and Economic Inequality.

****

Social science research on many topics is often hampered by the limitations of survey data, including relatively small sample sizes, low response rates and high costs. However, the digital age has increased access to large, comprehensive data sources, such as public and private administrative databases, and new sources of information from online transactions, social-media interactions, and internet searches. New computational methods also allow for the extraction, coding, and analysis of large volumes of text. Advances in analytical methods for exploiting and analyzing data, including machine learning, have accompanied the rise of these data. The emergence of these new data and methods also raises questions about access, privacy and confidentiality.

Examples of research (some recently funded by RSF) that are of interest include, but are not restricted to, the following:

Small Grants Competitions with Big Data

Many investigators have invested significant time and resources in assembling big data sets by linking and harmonizing administrative data from multiple jurisdictions or agencies (federal, state, local), linking administrative to survey data, or deriving new information from online or archival sources These data, once assembled, can have great value to other investigators beyond their original purpose.

RSF has funded small grants competitions where investigators have made large data sets available to the wider research community. Specifically, RSF issues a call for proposals developed by the investigators that offers small grants to graduate students and early career researchers proposing new projects to use these new data. The investigators lead the review panel that evaluates the proposals and participate in a conference at RSF (about a year after grants a made) at which funded researchers present their results. Examples of small grants competitions include Raj Chetty and Nathan Hendren’s release of public use statistics derived from IRS administrative tax records (see Equality of Opportunity Project; RSF Request for Proposals) or Sean Reardon’s assembling of educational achievement data for roughly 40 million public school students (see the Stanford Education Data Archive; RSF Request for Proposals).

We welcome inquiries from investigators who have developed similar data sets and are willing to make those data available for wider analyses through an RSF small grants competition. RSF does not provide support to assemble and prepare the data for release or the infrastructure support to house it once released.

Linked Administrative Data

Linking public administrative records from different agencies or jurisdictions can help answer long-standing questions of interest. Chetty, Friedman and Rockoff (2014a; 2014b) linked school district administrative records with federal income tax data to identify which teachers, in the short term, have the largest impact on student achievement, and in the longer-term, to show the extent to which students assigned to teachers with higher value-added scores have higher college attendance and higher salaries as adults.

Human decision-making processes involve different biases. The use of algorithms, increasingly common in decision-making processes that affect many decisions, including those regarding hiring and promotion, policing strategies, bail and sentencing, credit determinations, and the allocation of social services, raises many policy questions. Although algorithms are often perceived to be neutral and fair in their processes, some recent studies have found that they may contribute to outcomes that are biased and harmful, especially for disadvantaged populations (e.g., Sweeney, 2013). Other studies (e.g., Kleinberg, et al., 2018) however, suggest that algorithms can contribute to improved decision-making outcomes, including reductions in racial disparities.

In what circumstances and under what conditions are algorithms fair, neutral and outperform human decision-making outcomes? Under what conditions do they incorporate existing social biases to disparate impacts for the disadvantaged? If the latter occurs, how do biases get embedded in the algorithms? How do governments, organizations and social scientists evaluate algorithms for biases and establish accountability? Jens Ludwig and colleagues are investigating the tradeoffs between algorithmic fairness and efficiency and testing the extent to which four different methods for promoting algorithmic fairness in machine learning actually work.

Qualitative Research on Algorithm Implementation – RSF has an interest not only in understanding how algorithms operate in practice, but also in understanding the decision-making processes that jurisdictions use when incorporating algorithms into their work. What factors lead jurisdictions to supplement or replace human decision-making with algorithms? What criteria do they use for selecting which algorithms to use and how to evaluate their performance criteria?

Private Administrative Data

Proprietary data from sources such as credit reporting agencies, online real estate marketplaces, or retail firms are often extremely useful for addressing social science and policy questions. Normative decision theory implies that a dollar is a dollar no matter its source, but psychological research suggests that financial windfalls or additional expenses have different effects depending on which “mental accounts” they impact. Shapiro and Hastings (2017) analyze retail panel data (500,000 households, 6 billion transactions) to understand “mental accounting,” or how households think about and spend money from different sources.

Machine-Learning

Evidence from tax return data suggests no clear trend in intergenerational income mobility for recent cohorts of young adults (Chetty et al., 2014a; 2014b). In contrast, survey data suggest an increasing intergenerational persistence of occupational mobility. To date, no single “big data” source allows the analysis of income and occupational mobility simultaneously. Michael Hout and David Grusky are utilizing a machine-learning approach to code taxpayer occupation on Internal Revenue Service forms consistent with Current Population Survey records that already have respondent occupation reliably coded.

Atalay, Tannenbaum and Sotelo, using machine learning techniques, will extract job-related elements, including tasks, skills, and technology requirements from a dataset of job vacancies from published newspaper help wanted ads between 1940 and 2000 and online job vacancies posted between 2011-2017. They will study the extent to which the task content of occupations has changed over time, the impact of technology on tasks within occupations, and how these changes have affected earnings.

Online Surveys and Experiments

Survey response rates for in-person and telephone interviews have declined significantly and surveys are expensive to administer. Salganik and Levy (2015) highlight the advantage of Wiki surveys that have data collection instruments that can capture as much information as a respondent is willing to provide, collect information contributed by respondents that was unanticipated by the researcher, and modify the instrument as more information is obtained.

An extensive literature shows an association between race and economic outcomes, but it is difficult to determine the extent to which these associations are due to racial discrimination or characteristics correlated with race. Doleac and Stein (2013) use online classified advertisements to examine the effect of race on market outcomes by featuring a photograph of the item for sale, and experimentally manipulating the color of the seller’s hand (dark or light-skinned). They find that black sellers receive fewer and lower offers than white sellers, and that buyer communication with black sellers indicates lower levels of trust.

Text Analysis

Bail (2012) assessed competing predictions about how civil society organizations influence media portrayals of Muslims in the aftermath of 9/11. Using plagiarism detection software, he compared press releases about Muslims produced by civil society organizations to more than 50,000 newspaper articles and television transcripts produced between 2001 and 2008. He finds that anti-Muslim fringe organizations were overrepresented in media portrayals and exerted a powerful influence on media discourse, allowing these groups to enter the “mainstream.”

Enns and colleagues hypothesize that levels of redistributive and egalitarian policy rhetoric in Congress will decline as campaign contributions from wealthy donors and business interests increase. Using data from the Federal Election Commission since the 1970s, they incorporate automated content analysis and other qualitative analysis software to examine all speeches and content inserted into the Congressional Record by members of Congress during the same period.

Social Media

The large volume of data from social media sites and online interactions presents methodological challenges because the data are unstructured and lack demographic information that is central to social science research. Bail (2015) describes the development and application of “social media survey apps” (SMSAs) using Facebook data to illustrate how such data can be mined to study organizational behavior. McCormick and colleagues (2015) developed and implemented a method for retrieving demographic information from non-text images using Twitter data. Barberá (2016) combines voting registration records and home valuations from Zillow with Twitter data to generate representative public opinion estimates. He uses machine learning methods to estimate key demographics (age, gender, race, income, party affiliation, propensity to vote) of any Twitter user in the U.S.

Funding Considerations

Applicants should specify how the proposed project informs and advances RSF’s computational social science research priorities in one of its core program areas: Behavioral Economics, Future of Work, Race, Ethnicity and Immigration, and Social Inequality. RSF values reproducibility and open science, and where applicable, investigators should explain their data release plan (data, code, codebooks) or any prohibitions on providing such materials.

Examples of the kinds of questions that are of interest can be found on the Foundation’s website at each of the links above, but examples include:

What are the psychological consequences of income scarcity and how do they affect individual decision-making and judgment?

What factors influence decision-making processes that involve tradeoffs between costs and benefits that occur at different points in time, or the tendency to over-value immediate rewards at the expense of longer-term benefits?

To what extent have labor market changes affected family formation, transitions to adulthood, or social mobility?

Job quality is related to many different factors including government policies (e.g., minimum-wage laws or parental and sick leave policies) and employer instituted policies (e.g., flex hours, retirement plans). What are the consequences of such policies for employers, workers and families?

How do race-related beliefs evolve in the context of growing population diversity?

What is the impact of immigration policies on the social and political development of immigrants? To what extent have these policies influenced public opinion, inter-group relations or civic participation?