Hash Object Merging in SAS - Simplified

Hash Merging in SAS


Need to join two or more large tables and it is taking too much time to do so? Not to worry, Hash Object Merging is here to rescue !

Hash Object Merging is one of the most efficient methods of merging datasets in SAS. Though memory intensive, it is much more faster than regular merging with data step or SQL join in terms of CPU time.

Syntax is not that simple, but no harm in learning a new technique.


Why is it faster ? 


A Hash object is a collection of objects loaded in memory, but not on disk !


We don't recommend to remember its complicated code. When ever you need it, just take it from here, modify it and Bingo ! Your merging would be as fast as possible with the given resources.

Data Sales;
Input Year Month Sales;
cards;
2010 1 1013
2010 2 1472
2010 3 1575
2010 4 1717
2010 5 1287
2010 6 1621
2015 1 1720
2015 2 1973
2015 3 1801
2015 4 1400
2015 5 1411
2015 6 1461
;
Run;



Data Footfall;
Input Yr Mon Male_Footfall female_footfall;
cards;
2010 1 133 153
2010 2 170 143
2010 3 125 100
2010 4 182 200
2010 5 177 190
2010 6 181 201
2015 1 112 100
2015 2 159 109
2015 3 142 107
2015 4 148 200
2015 5 139 300
2015 6 176 298
;
Run;

Data Promotion;
Input Year Month Scehme & $20.;
cards;
2010 1 Discount
2010 2 Freebies
2010 3 Discount
2010 4 Freebies
2010 5 Lucky Draw
2010 6 Discount
2015 1 Lucky Draw
2015 2 Freebies
2015 3 Discount
2015 4 Lucky Draw
2015 5 Freebies
2015 6 Discount
;
Run;



/* Here comes the Hash Object Merging */


Data Result_data ;

/*With the below statement, SAS holds the descriptive portion of the data*/
if 0 then set Sales Footfall Promotion;

/*The below statement prohibits the re-load of the hash object in the memory*/
if _N_ = 1 then do;

/*Now we define the first hash object : FF*/
Declare hash FF (dataset: 'Footfall');
FF.definekey('Yr','Mon');
FF.defineData (All : 'yes' );           /*  Shortcut for mentioning all columns at once */
FF.definedone();

/*Now we define the second hash object : Promo */
Declare hash Promo (dataset: 'Promotion');
Promo.definekey('Year','Month');
Promo.defineData('Year','Month','Scheme');
Promo.definedone();

end;

/*Now we call the base dataset*/
Set Sales;
/*Below we define the matching criterion*/
if  FF.find(key:year, key:Month) = 0             /* Need to mention key as column names are different*/
and Promo.find() = 0
then output;
run;


Resut_Data

Well, this was very basic version of this complicated method of merging. It looks complicated, but one you practice it, it won't look so monstrous !


Let's now understand, what is needed to be done in case the number of observations across the datasets are not same. What if we need to do a left or Inner join ?

Let's check more about Hash Object Merging  >>>