Data on Kosovo Killings

The data on killings in Kosovo are in four files. All of the files are comma-delimited ASCII. The fields in each file are described below.

If you use these data on Kosovo killings, please cite them with the following citation, as well as this note:

“These are convenience sample data, and as such they are not a statistically representative sample of events in this conflict. These data do not support conclusions about patterns, trends, or other substantive comparisons (such as over time, space, ethnicity, age, etc.).”

Raw data

The first file is md_pub.csv. It contains 4725 records (see below for why there are more records than victims). Appendix 1 of the report gives a full description of how this file was compiled. We have omitted the names of the victims in order to protect both the victims’ privacy and to protect the people who gave information to the organizations that collected the data. It contains records of deaths reported to have occurred during the period 20 March 1999 – 20 June 1999; reported deaths outside that period were not included in our analysis and so are not included in this file.

Each record represents one death or a partial death. The partial deaths are those for which the date of death was missing. Quoting from pages 30-31 of the report, “For 204 records with no date information, a hot deck procedure was employed to assign a date at random from a donor record that was geographically closest to the location of the record with the missing date.Three dates were randomly selected from the potential donors, and copies of the original record were created with each of the sampled dates. The new records were each assigned a weight of 0.33.”

Note therefore that the total number of victims is the sum of the “weight” field, which equals 4399.67.

However, not all weighted deaths have three records with the same id. Continuing to quote from page 31, “Some of the hot-decked dates were outside the date range of interest to this study (20 March-22 June). Those records (and their partial weights) were therefore excluded from the analysis.”

It contains the following fields.

Field name

Field description

id

The id of this record. Note that these are not unique.

age

The age at death of this victim. Note that 0 denotes an infant, and -1 indicates that the age is unknown.

sex

M=male, F=female, U=unknown.

pcode

The geographic code for the village or town in which the death occurred. See the geographic dataset for more information.

mcode

The geographic code for the municipality in which the death occurred. See the geographic dataset for more information.

dt_kill

The date of the death.

dtk2

The date of death rounded to two-day periods; note that each period includes the following day.

aba

1=this death was reported to the ABA (see pages 18-19 in the report).

exh

1=this death was identified in an exhumation (see pages 19-20 in the report).

hrw

1=this death was reported to HRW (see pages 20-21 in the report).

osce

1=this death was reported to the OSCE (see page 21 in the report).

weight

1=record with a complete date; 0.33=record with an imputed date.

Estimates

The remaining three files contain our estimates, using the data in md_pub.csv and following the procedures described in Appendix 2 of our report.

Over time

dtk2_oth.csv contains data estimated by two-day periods. These data underlie (for example) Figure 2 (page 6), and the regressions over time presented in Figure 19 (page 58), first and third columns. Note: these data have been corrected as described in the 15 November 2002 corrigendum.

Field name

Field description

dtk2

as above.

modelspec

The model used to estimate the total deaths for this point in standard log-linear notation. See Appendix 2, section 3.5 and following. This value is empty when it was impossible to estimate any model for this period (e.g., 11may99).

nsum

The total estimated deaths for this two-day period. Note that this value is simply the reported deaths when modelspec is missing. The cell counts from which this was estimated can be computed using the raw data.

sd

The estimated standard error of the estimate of nsum, as described in Appendix 2, page 40, in the report.

lvcnt

The estimated total number of people leaving home during this two-day period. See the description of migration data, and Policy or Panic.

bomb

The number of NATO airstrikes in this period. See the description of other data, and Section 5, pp. 8-13 in the report.

bomblag

The number of reported NATO airstrikes in the previous period (note that this is missing for 20mar99).

klaB

The number of reported KLA exchanges of fire with Serb authorities. See the description of other data, and pp. 11-12 in the report.

klaBlag

The value of klaB in the previous two-day period.

klaK

The number of reported Serb casualties caused by interactions with the KLA. See See the description of other data, and pp. 11-12 in the report.

The six-day period defined as the listed day and the five following days.

gcode

North, south, east or west. The classification of municipalities into regions is described in Figure 3, page 7 of the report. Also see the geographic data.

modelspec

as above.

nsum

as above.

sd

as above.

lvcnt

as above.

bomb

as above.

bomblag

as above.

klaB

as above.

klaBlag

as above.

klaK

as above.

klaKlag

as above.

Over region and two-day period

rgdtk2est_oth.cvs contains data estimated by region and two-day periods. These data underlie (for example) Figures 4-7 (page 9-10), and the regressions over time presented in Figure 19 (page 58), second and fourth columns. Note: these data have been corrected as described in the 15 November 2002 corrigendum.