Sometimes it is useful to look at data in
an aggregated fashion, so that comparisons broader comparisons can be made. For example,
for targeting purposes, it could be useful to map out the levels of malnutrition by
district or province in a country. Aggregation is one way to summarize a situation easily,
and might be useful if using a mapping program that assigns values by area (such as
provincial stunting, wasting, or underweight).

Aggregation is simple, but care must be taken to
properly select the variables that lend themselves to aggregating and also to recode and
prepare the data that is not ready for aggregation. There will inevitably be some data
lost in aggregation, for any time data is condensed it is loses information. It is always
necessary to save the new aggregated data file as a new file, therefore it will be used
for other types of analysis where it is more useful to look at more detail.

The module provides one EXERCISE in AGGREGATING,
using a portion of a data set from Sri Lankan Community Nutrition Project Baseline Survey
called the PNIP. The data set has only a few basic variables to introduce the idea of
aggregation and show the details of creating an aggregate file. Sri Lanka is broken into
divisions at the provincial level, the district level, the district secretariat level, and
the community level. The data could be collapsed at any of these break points, but this
lesson shows collapse at the DISTRICT level (district will be the break
variable).

Follow these steps to aggregate:

1. Open SPSS

2. Open
the data file named SAsia.sav(create
a CODE BOOK before to see the variable definitions)

3. Use a Code Book to detect any errors in the data.
The goal is to have most of the variables labeled in a bivariate format so that 1=
positive response and 0= negative response. The 0,1 will allow for a Mean score to be
processed in the aggregation (other option are available, e.g. number of cases or % below
a certain number, etc). When MEAN is used for the choice in aggregation, each
case is counted for the denominator, but only the positive responses (1) are counted for
the numerator. So if 5 of 50 cases are positive for stunting, the calculation will be
5/50= 0.1

4. If this mean outcome is multiplied by 100 it
would give the percentage positive for (affected by) the variable of interest.

5. Recode any variables that need recategorizing or
cleaning. Most variables will be 0,1, but some of the continuous outcome can be left as is
so that a mean value will be calculated, for instance the WAZ, HAZ, WHZ scores. Just
double check that all variables are properly coded so that either a count or a mean score
can be calculated for the district level.

6. Once the data is prepared (this is crucial),
click on Data, Aggregate.

7. Move the variable district into the Break
variable box and move the ALL of the remaining variables into the Aggregate
variable box.

8. Each one will automatically be given the default
option for calculation (which is the Mean value), but if a different calculation is
desired, just click on the variable of interest and click on Function to choose a
different option. Change the function option on fno (family number in the
household) and childnum (childs number for which the interview is conducted)
to Number of cases instead of mean.

9. Click on the dot labeled Create a new data
file and type in the name c:/file location/SADistrict.sav

10. Click on OK.

Label the variables for the new data file
SADistrict.sav. Compare this file with the sample file made called SADist.sav.

This is all there is to AGGREGATION. The rules to
remember are:

Make sure the data is clean and properly
coded to aggregate.

Be sure the data lends itself to aggregation. For
example, this example data is actually not very REPRESENTATIVE when aggregated since there
was not a proper sampling of villages within each district. This was just an example for
the purpose of practice. Usually, caution would be taken to only combine at a level that
will not misrepresent an area. Sampling strategies are key.

Always create a NEW data file when
aggregating to avoid losing the entire disaggregated data set.