Dynamic column creation

In my first dataset I have the all categories of a shop. it is close to 500 categories.

In my second dataset I summarized sales,units by customer and week and category..

Now I have to create a create sales for each customers for each category in column( similar to transpose) here is the challenge if in transaction file some of the categories is not present then it would show if i use proc transpose.

here is the macro that i wrote using macro it is working fine. the transactions dataset has more than 6 million records and the problem is, it is taking 49 hours to run ...

Re: Dynamic column creation

Do these statements model your data and process well enough to optimization?

data cat;

if0thenset sashelp.prdsal3(keep=product);

do product='BED','CHAIR','DESK','RUG','SOFA','TABLE';

output;

end;

stop;

run;

procsummarydata=sashelp.prdsal2 classdata=cat nway;

by COUNTRY STATE COUNTY notsorted;

class product;

var ACTUAL PREDICT;

outputout=summary sum=;

run;

proctransposedata=summary out=tsumm;

by COUNTRY STATE COUNTY product notsorted;

var actual predict;

run;

Proctransposedata=tsumm out=wsumm(drop=_name_) delimiter=_;

by COUNTRY STATE COUNTY notsorted;

var col1;

id product _name_;

run;

procprint;

run;

This technique using a double transpose will also be very slow but we can speed it up by doing the transpose to extra wide using a data step. Similar to yours but not so clunky. Again if this is a reasonable approximation of your process I will post the faster version.

Re: Dynamic column creation

Do these statements model your data and process well enough to optimization?

data cat;

if0thenset sashelp.prdsal3(keep=product);

do product='BED','CHAIR','DESK','RUG','SOFA','TABLE';

output;

end;

stop;

run;

procsummarydata=sashelp.prdsal2 classdata=cat nway;

by COUNTRY STATE COUNTY notsorted;

class product;

var ACTUAL PREDICT;

outputout=summary sum=;

run;

proctransposedata=summary out=tsumm;

by COUNTRY STATE COUNTY product notsorted;

var actual predict;

run;

Proctransposedata=tsumm out=wsumm(drop=_name_) delimiter=_;

by COUNTRY STATE COUNTY notsorted;

var col1;

id product _name_;

run;

procprint;

run;

This technique using a double transpose will also be very slow but we can speed it up by doing the transpose to extra wide using a data step. Similar to yours but not so clunky. Again if this is a reasonable approximation of your process I will post the faster version.

Re: Dynamic column creation

This uses the data from your CAT and the analysis variable names from master data(prdsal3). PROC TRANSPOSE is used to do that to create a data set with one observation. This will be SET to create an array. Also the same data transposed to verticle will be loaded into a hash. This way we can lookup the array index for any varaible name using PRODUCT and analysis variable name. The PROC TRANSPOSE parts will be fast because they process very little data. The data step to SUM each analysis variable into the array by ID should run faster than the double PROC TRANSPOSE but produce the same result. I have included the double transpose and proc compare as a check. I will leave it to you to benchmark.

dm'clear log; clear output;';

procdatasetskill;

quit;

%let data=sashelp.prdsal3;

* Build array using analysis varible names and product(cat);

data cat;

if0thenset &data(keep=product ACTUAL PREDICT);

do product='BED','CHAIR','DESK','RUG','SOFA','TABLE';

output;

end;

stop;

run;

procprint;

run;

proctransposedata=cat out=tcat;

by product;

var ACTUAL PREDICT;

run;

data tcat;

set tcat end=eof;

col1 = _n_;

if eof thencall symputX('DIMx8',col1*8); *for init to zero;

run;

* new variables to hold summary data, ARRAY WCAT;

* there is one variable for each PRODUCT and ANALYSIS variable;

proctransposedata=tcat out=wcat(drop=_name_) delimiter=_;

var col1;

id product _name_;

idlabel _label_;

run;

Procprint;

run;

* data to load into hash, this will index the array by compound name (COMPNAME);

Re: Dynamic column creation

As far as I know PROC TRANSPOSE is the only way to create variables dynamically from data. The data step cannot create a variable after it starts to execute and therefore cannot create a variable from data. I would classify the data step transpose as data driven but not dynamic in the same way as PROC TRANSPOSE.

Re: Dynamic column creation

This part is just to unitialize the array to zero for each level of the BY variables (ID). You could leave it out and get missing instead.

If works by POKEing a string of repeating ZEROS formated RB8. into the array WCAT. ADDR is the memory location of element 1 of WCAT and LENGTH is the total memory length of WCAT (8*number of CATs * number of analysis variables).

It is effectively

do i = 1 to dim(wcat);

wcat=0;

end;

It should be faster.

Like I said you can leave it out and get the same result with missing for any CAT that does not exist for an ID.