Using Proc Transpose to widen a time series

I find I’m doing this fairly frequently recently with our reporting: starting with a set of unit record data, I use proc summary to get to a sorted, narrow dataset like this:

Report_Month

Location

Outcome

Num_Cases

01Jan2010

Auckland

A

3

01Jan2010

Auckland

B

5

01Jan2010

Christchurch

A

1

01Jan2010

Christchurch

B

2

01Jan2010

Christchurch

C

5

01Jan2010

Wellington

B

1

01Jan2010

Wellington

C

7

01Feb2010

Auckland

A

5

01Feb2010

Auckland

B

4

01Feb2010

Auckland

C

1

01Feb2010

Christchurch

A

2

01Feb2010

Christchurch

B

2

01Feb2010

Christchurch

C

5

01Feb2010

Wellington

A

2

01Feb2010

Wellington

B

3

01Feb2010

Wellington

C

1

01Mar2010

Auckland

A

6

01Mar2010

Auckland

B

3

01Mar2010

Christchurch

B

1

01Mar2010

Christchurch

C

6

01Mar2010

Wellington

A

1

01Mar2010

Wellington

B

3

01Apr2010

Auckland

A

2

01Apr2010

Wellington

C

5

Whereas the requirement would be to get to a wider set like this:

Report_Month

Location

A

B

C

01Jan2010

Auckland

3

5

0

01Jan2010

Christchurch

1

2

5

01Jan2010

Wellington

0

1

7

01Feb2010

Auckland

5

4

1

01Feb2010

Christchurch

2

2

5

01Feb2010

Wellington

2

3

1

01Mar2010

Auckland

6

3

0

01Mar2010

Christchurch

0

1

6

01Mar2010

Wellington

1

3

0

01Apr2010

Auckland

2

0

0

01Apr2010

Wellington

0

0

5

There are two ways I know of to achieve this transformation in SAS code. One would be to create the array of variables needed in a data step and populate them for each by group (in this case, Report_Month and Location would be the by variables) in turn. This approach has got a lot to recommend it – most importantly to my mind the data step is both flexible and straightforward.

On the other hand…if you’re dealing with a lot of variables the code required can get pretty verbose. Which is why a lazy programmer like me finds proc transpose to be a better solution – in this (admittedly simple) case, very little code is required:

The code is short, and explicit – what’s intended here will be clearer to another programmer than the typical array processing idiom used in the data step.

Here’s a wrinkle, though: the transpose procedure quite rightly determines that in this case, it needs to create 3 new columns in the output dataset wide, to allow for the 3 outcome values A, B and C occurring in narrow. What if there were actually 4 outcomes that had to be counted, as in the below?

Report_Month

Location

A

B

C

D

01Jan2010

Auckland

3

5

0

0

01Jan2010

Christchurch

1

2

5

0

01Jan2010

Wellington

0

1

7

0

01Feb2010

Auckland

5

4

1

0

01Feb2010

Christchurch

2

2

5

0

01Feb2010

Wellington

2

3

1

0

01Mar2010

Auckland

6

3

0

0

01Mar2010

Christchurch

0

1

6

0

01Mar2010

Wellington

1

3

0

0

01Apr2010

Auckland

2

0

0

0

01Apr2010

Wellington

0

0

5

0

The data step is going to be able to cope a lot better with this sort of situation, but let’s just assume that I’ve already written way too much code to change horses now. OK, here’s one way to fix this: