A "Wide" Dataset

A "Tall" Dataset

The first advantage of the tall dataset is that it's ready-to-go for our statistical procedures (PROC MEANS, PROC FREQ, PROC TABULATE, PROC GCHART, etc.). For example, you can easily calculate:

the average amount overall

the total amount overall

the average amount by quarter

the total amount by quarter

the average amount by employee

the total amount by employee

all of the above

any combination of the above

Just use different combinations of class variables. For example:

proc meansdata=tall summean;
var Amount;
class Employee_ID Qtr;
run;

Statistical analysis and graphics are not so simple with the wide dataset.

A second advantage of the tall dataset is that it does not waste space storing missing values.

A third advantage of the tall dataset is that it imposes no artificial limit, such as 4, on the number of measures per employee, or per whatever. Suppose, instead of calendar quarters, the database stores information about each employee's dependents? Or health care visits? How many dependents, or visits, should we allow for? Four? Ten? Anything you choose is arbitrary. With a tall dataset, if there is another dependent, or visit, you simply add another row. There are no missing values and no artificial limit.

About Author

Jim Simon is a principal instructor and course developer for SAS Education. Jim has a bachelor's degree from UCLA and a master's degree from California State University at Northridge. Prior to joining the SAS Irvine office in 1988, Jim was an instructor at Ventura College and a SAS programmer at The Medstat Group in Santa Barbara. Jim's areas of specialization include the DATA step, application development, web enablement, and the SAS macro language. A native of Southern California, Jim enjoys anything in the warm California sun. On weekends, Jim loves jumping in his Corvette, turning up the stereo, and cruising Pacific Coast Highway, top down, South to Laguna Beach or North to his old home town, Santa Barbara.