I got an email from my superheroic PhD adviser in June 2006: Would I be interested in relocating to Palo Alto for six months in order to work with Patrick Ball at the Human Rights Data Analysis Group? (She'd gotten a grant and would cover my stipend.) Since I'd spent the last several months in New Haven wrestling ineffectually with giant, brain-melting methodological problems, I said yes immediately.
The plan with my adviser was simple: I'd digitize the ancient, multiply-photocopied pages of data from the United Nations Truth Commission for El Salvador, combine them with two other datasets, match across all the records, and produce reliable ...

Almost a quarter century ago, on November 16, 1989, six Jesuit scholars, their housekeeper and her 15-year-old daughter were massacred inside the University of Central America (UCA) in San Salvador, El Salvador. Their chief target was the rector of the country’s leading university. The murders were carried out by members of the elite Atlacatl Battalion, acting on the direct orders of the highest-ranking members of the Salvadoran military. The United Nations–sponsored Truth Commission for El Salvador found that members of the Salvadoran military's high command “gave...the order to kill Father Ignacio Ellacuría and to leave no witnesses.” ...

<< Previous post, MSE: Stratification and Estimation
Q15. Are there other MSE models one might use with human rights data?
Q16. Is it possible to use MSE to model non-lethal human rights violations?
Q17. I am concerned about using MSE with my data, because the datasets were gathered by opposing organizations. Victims who were reported to an NGO were very unlikely to be reported to state sources, but also very likely to be reported to religious organizations. Won't that cause the overlaps between the NGO list and the state list to be artificially low, and the overlaps between the NGO list and the church list to be artificially high? Does ...

<< Previous post, MSE: The Matching Process
Q10. What is stratification?
Q11. [In depth] How do HRDAG analysts approach stratification, and why is it important?
Q12. How does MSE find the total number of violations?
Q13. [In depth] What are the assumptions of two-system MSE (capture-recapture)? Why are they not necessary with three or more systems?
Q14. What statistical model(s) does HRDAG typically use to calculate MSE estimates? (more…)

<<Previous post: Collection, Cleaning, and Canonicalization of Data
Q8. What do you mean by "overlap," and why are overlaps important?
Q9. [In depth] Why is automated matching so important, and what process do you use to match records?
Q8. What do you mean by "overlap," and why are overlaps important?
MSE estimates the total number of violations by comparing the size of the overlap(s) between lists of human rights violations to the sizes of the lists themselves. By "overlap," we mean the set of incidents, such as deaths, that appear on more than one list of human rights violations. Accurately and efficiently identifying overlaps between ...

<< Previous post: MSE: The Basics
Q3. What are the steps in an MSE analysis?
Q4. What does data collection look like in the human rights context? What kind of data do you collect?
Q5. [In depth] Do you include unnamed or anonymous victims in the matching process?
Q6. What do you mean by "cleaning" and "canonicalization?"
Q7. [In depth] What are some of the challenges of canonicalization? (more…)

Multiple systems estimation, or MSE, is a family of techniques for statistical inference. MSE uses the overlaps between several incomplete lists of human rights violations to determine the total number of violations. In this blogpost, and four more to follow, I’ll answer both conceptual and practical questions about this important method. (In posts to follow, questions that refer to specific statistical procedures or debates will be marked, "In depth.") (more…)