Proc SQL vs. Data Step in SAS - Part 2


Join (Proc SQL) vs. Merge (Data step)

Your wait is over. Let's deep dive a little into the background working algorithm of Join and Merge. This one is quite interesting, here efficiency as well as the result may vary ...

Please go through the first part of this else it would be like start watching a movie at interval.



In Data step merge, data is merged point to point.

Suppose we merge following two datasets, without even a common field, check how it merges:

Data ABC;
input name_1 $ marks_1 ;
cards;
john 10
jack 20
jim  30
;
Run;

Data PQR;
input name_2 $ marks_2 ;                       The result would be :
cards;
walker 100
daniel 200
beam  300
;
Run;

Data final;
Merge ABC PQR;
proc print;
Run;

You can see that it merged the data point to point, based on position of each observation, however if we give matching field, SAS would ask you to first sort data sets based on the matching field and then again would merge data point to point position basis. Now we know how the data step merge works.

Let's see now, how SQL join would behave in the first example.

Data ABC;
input name_1 $ marks_1 ;
cards;
john 10
jack 20                                                       The result would be :
jim  30
;
Run;

Data PQR;
input name_2 $ marks_2 ;              
cards;
walker 100
daniel 200
beam  300
;
Run;

Proc SQL;
select * from ABC , PQR;
Quit;

What's that ... yes.... that's basic difference between Data step and a SQL join.

In SQL join, SAS first makes a Cartesian product (all possible combinations) as it made in above case. it then subsets this data based on the matching field condition given ... which was not present in the above code.

As in Proc SQL join, first SAS makes the Cartesian product and then subsets, suppose your datasets have 1 million (1000000) observations each. It would first make a data having :

1000000 X 1000000 = 1000000000000 observations

and then it would subset the data on the basis of matching field condition given. It is very time consuming and very much more memory consuming, which sometimes becomes a showstopper. This is the point number 1.

Let's see one more case by example.
*_____________________________________________________________________;
Data name_class;
input name : $10. Class $;
cards;
Ram 10
Shyam 12
Sita 10
Geeta 12
;
Run;


Data name_Age;
input name : $10.  Age $;
cards;
Ram 20
Shyam 22
Balram 25
Sita 18
Geeta 18
;
Run;

proc sort data = name_class;  by name; run;
proc sort data = name_age ;  by name; run;
Data Merged;
merge name_class name_age;
by name;
proc print;
Run;

Proc SQL;
create table joined as select * from name_class as a full join name_age as b
on a.name = b.name;
quit;
proc print data = joined;run;
*_____________________________________________________________________;


Could you notice the contrast, the extra element in the second data appeared in the data, but the name was missing. So we have to be extra careful while using Proc SQL join. This is point number two.

Better Join code should be written this way :
*_____________________________________________________________________;
Proc SQL;
create table joined as select coalesce(a.name, b.name) as name, class, age from name_class as a full join name_age as b
on a.name = b.name;
quit;
proc print data = joined;run;
*_____________________________________________________________________;

Enjoy reading our other articles and stay tuned with ...

Kindly do provide your feedback in the 'Comments' Section and share as much as possible.






12 comments:

  1. No big surprise that pretty much every significant tech organization as of now has a functioning AI program set up. machine learning course in pune

    ReplyDelete
  2. Thanks for sharing information on SQL It's helped me a lot.

    Thanks
    Review To Buy Online

    ReplyDelete
    Replies
    1. A big thank you for sharing this post but if anyone looking for best AutoCAD training institute in delhi look here Contact Here-+91-9311002620 Or Visit Website- https://www.htsindia.com/AutoCAD-training-courses

      Delete
  3. Your post is really good thanks for sharing this kind of content i hope you will share these kinds of stuff in near future apart from that if anyone is looking for a training institute for C++ training.Contact Here-+91-9311002620 Or Visit post Website-
    https://www.htsindia.com/Courses/modular-courses/c-plus-plus-training-course

    ReplyDelete
  4. A very big thank you for sharing this post apart from that if anyone looking for best Advanced Excel training institute in delhi so contact here +91-9311002620 visit https://www.htsindia.com/Courses/business-analytics/adv-excel-training-course

    ReplyDelete
  5. Thanks for sharing this amazing post this is the content i really looking for, it's very helpful i hope you will continue your blogging anyway if anyone looking for AutoCAD training institute in delhi contact us +91-9311002620 visit-https://www.htsindia.com/autocad-training-institute

    ReplyDelete
  6. Thank you, it is a nice article, and it has clearly shown the responsibility of freelance developers. I liked how you talked about freelancing platforms for the top developers. I also came across Eiliana.com, the new and emerging platform for freelancers. I hope this also helps you.

    ReplyDelete
  7. Thank you for sharing an amazing & wonderful blog. This content is very useful, informative and valuable in order to enhance knowledge. Keep sharing this type of content with us & keep updating us with new blogs. Apart from this, if anyone who wants to join the SAS Training institute in Delhi, can contact 9311002620 or visit our website-
    Best SAS Training Institute in Delhi | Best SAS Training Institute in Noida (htsindia.com)

    ReplyDelete

Do provide us your feedback, it would help us serve your better.