Our Blog

Matt Salganik

February 10, 2017

Apply to participate

The Fragile Families Challenge is now closed. We are no longer accepting applications!

What will happen after I apply?

We will review your application and be in touch by e-mail. This will likely take 2-3 business days. If we invite you to participate, you will be asked to sign a data protection agreement. Ultimately, each participant will be given a zipped folder which consolidates all of the relevant pieces of the larger Fragile Families and Child Wellbeing Study in three .csv files.

Six outcome variables (each variable name links to a blog post about that variable)

Continuous variables: grit, gpa, materialHardship

Binary variables: eviction, layoff, jobTraining

prediction.csv contains 4,242 rows and 7 columns:

challengeID: A unique numeric identifier for each child.

Six outcome variables, as in train.csv. These are filled with the mean value in the training set. This file is provided as a skeleton for your submission; you will submit a file in exactly this form but with your predictions for all 4,242 children included.

Understanding the background variables

To use the data, it may be useful to know something about what each variable (column) represents. Full documentation is available here, but this blog post distills the key points.

Waves and child ages

The background variables were collected in 5 waves.

Wave 1: Collected in the hospital at the child’s birth.

Wave 2: Collected at approximately child age 1

Wave 3: Collected at approximately child age 3

Wave 4: Collected at approximately child age 5

Wave 5: Collected at approximately child age 9

Note that wave numbers are not the same as child ages. The variable names and survey documentation are organized by wave number.

Variable naming conventions

Predictor variables are identified by a prefix and a question number. Prefixes the survey in which a question was collected. This is useful because the documentation is organized by survey. For instance the variable m1a4 refers to the mother interview in wave 1, question a4.

The prefix c in front of any variable indicates variables constructed from other responses. For instance, cm4b_age is constructed from the mother wave 4 interview, and captures the child’s age (baby’s age).

Ready to work with the data?

About Matt Salganik

Matthew Salganik is a Professor of Sociology at Princeton University. He is also the author of the forthcoming book Bit by Bit: Social Research in the Digital Age (http://www.bitbybitbook.com). You can learn more about his research at http://www.princeton.edu/~mjs3.