sas >> Concatenate string records using proc sql

by WHITLOI1 » Mon, 11 Aug 2003 21:43:32 GMT

Torben,

SQL was designed to work with normal data structures; hence de-normalization
is generally a hard thing to code in SQL. Without some restrictions on the
data the code could get quite hard and would probably involve macro to be at
all general.

Now assuming at most two distinct values of the variable to be concatenated
in any one group and no repeated values in any one group, the following code
works.

In general you could set aside the minimum values, left join the constructed
set with the remaining minimum values, and repeat this operation until set
of minimum values is empty. Such a plan removes the restriction on the
number of records in a group, but you will still run into trouble with
repeat values. With repeat values the can be no SQL solution without
introducing a sequencing variable to distinguish records with repeat values.

As you can see it much harder than the suggested DATA step solutions. One
of the big advantages to SAS is that it can work well when the data has a
good structure, and it also works much easier than most procedural languages
with poorly structured data because it imposes only a rectangular named
variable structure on SAS data and no limits on external data. In short it
is a bad idea to restrict solutions to SQL when SQL is the wrong tool for
the task.

sas >> Concatenate string records using proc sql

sas >> Concatenate string records using proc sql

by HERMANS1 » Wed, 13 Aug 2003 05:03:08 GMT

Just for fun, this query solves the problem you stated:

proc sql;
create table concat as
select t1.var1 as var1,case when t1.var2 < t2.var2
then trim(t1.var2)||t2.var2
else t1.var2
end as var3
from test1 as t1 left join test1 as t2
on t1.var1=t2.var1
group by t1.var1 having length(var3)=max(length(var3))
;
quit;

Unfortunately it does not prove to be very robust. If you add the line '1
bc' to the dataset, for example, it does not give you the same form of
result.

sas >> Concatenate string records using proc sql

by Torben Brandt » Wed, 13 Aug 2003 07:18:33 GMT

As some of you have figured out, I could have more than two rows in each
group. I found another way to solve the problem.

This solution need a column numbering the rows from 1 and up, but that
easy to make.
I still have one problem: Even though I write "else ''" it counts
missing values as a string containing spaces. Does anyone know how I
make that an empty string ''? (Trim leaves the leadning space)

I intend to dynamically use a string in the PROC SQL where clause..I
have dipped my hands into Macro quoting to achieve this and have a few
questions..
given a list of variables defined in %variables= var1 var2 var3;, I
want to execute..
PROC SQL;
select name
from <data>
where name in ('var1','var2','var3');
quit;
run;
To dynamically code the string 'var1','var2',.... I have tried to code
the following macro.
%let count = 3; /* a counter set to 3 because we know there are three
variables at present */
%macro mac1;
%let i = 1;
%let step = %scan(&variables,&i);
%let code =;
%do %while (%length(&step)>0);
%if &i < &count %then %do;
%let code = &code%nrquote(%bquote('))&step%nrquote(%bquote('))%str(,);
%end;
%if &i = &count %then %do;
%let code = &code%nrquote(%bquote('))&step%nrquote(%bquote('));
%end;
%let i = %eval(&i + 1);
%let step = %scan(&variables,&i);
%end;
&code
%mend mac1;
Upon invoking this macro as
PROC SQL
select name
from <data>
where name in (%mac1);
quit;
I get errors..
However, when I look at the log -- using %put &code, instead of &code
-- I see the exact statements that I want, i.e. 'var1','var2',var3'
Thanks,
Rushi

I need to union(concatenate) 9-10 datasets. Each has same number of
variables and all the variables are same.
I am wondering what is the quickest way to union(concatenate) a big
number of datasets?
Proc SQL can use UNION statement like:
Proc SQL;
(select * from a)
union
(select * from b)
union
(select * from c);
quit;
But can I union 9-10 datasets at one time? Would that take long time?
If I use Proc Append? I can only append one dataset at a time, right?
Would it be more time consuming?
Thanks,
Mark