How Data Step in SAS and Proc SQL differ and which one should I use and where ?
Is that even a big question that one should bother about ? Leaving syntax, is there even a difference between two? Probably, you must be thinking about these points, Right ?
1. Mr. A : Simple coder & Mr. B : Complicated coder ( I hate B types )
2. Mr. X : Data step freak & Mr. Y : Proc SQL freak ( I am an X type)
Talking of first classification, for any task Mr. A writes code in pieces, however Mr. B writes it in a single piece ... Will show you guys example some other time. For now let's focus on second classification.
Talking of the second classification, i.e. Mr. X and Mr. Y, these guys do the same things but Mr. X uses Data step for almost everything, unless he is rolling up the data. Mr. Y, on the on the other hand, writes almost everything using Proc SQL.
... per say ...
Mr. X write uses Keep a b in data step while Mr.Y uses select a, b statement in Proc SQL.
so on ...
So what ? Principally there is no issue as such, both are Okay .... it is a matter of coding practice. There is no problem in being Mr. X or Mr. Y, mostly difference comes in terms of efficiency, which is very sometimes, especially in BFSI industry where they deal with large datasets. Don't believe me ... believe your own eyes, check the following examples :
Data check ;
do i = 1 to 10000000;
output;
end;
Run;
Data check_1;
set check ;
if mod(i,2) = 0 then flag = "Even";
else flag = "Odd";
Run;
Proc SQL;
create table check_2 as
select i , (case when mod(i,2) = 0 then "Even" else "odd" end) as Flag
from check ;
Quit;
*_____________________________________________________________________;
Try the above code and check the real and CPU time in log, and also try to time it yourself. Check how much time it took SAS to process the codes. There is definitely a slight difference, Data step has proved itself to be more efficient.
Increase one more 0 in the do loop and now see the difference.This gap widens for sure.
1. If ... else if vs. case when statements
*_____________________________________________________________________;Data check ;
do i = 1 to 10000000;
output;
end;
Run;
Data check_1;
set check ;
if mod(i,2) = 0 then flag = "Even";
else flag = "Odd";
Run;
Proc SQL;
create table check_2 as
select i , (case when mod(i,2) = 0 then "Even" else "odd" end) as Flag
from check ;
Quit;
*_____________________________________________________________________;
Try the above code and check the real and CPU time in log, and also try to time it yourself. Check how much time it took SAS to process the codes. There is definitely a slight difference, Data step has proved itself to be more efficient.
Increase one more 0 in the do loop and now see the difference.This gap widens for sure.
2. Join (Proc SQL) vs. Merge (Data step)
This one is quite interesting, here efficiency as well as the result may vary ...How ? .............. wait don't get impatient ... for this first we need to deep dive a little into the background working algorithm of Join and Merge.
---------------------------------- Click here for the next part---------------------------------------
No comments:
Post a Comment
Do provide us your feedback, it would help us serve your better.