Like Shawshank Redemption's stable position in IMDB rating, following question has maintained its presence in SAS interviews' FAQs list.
What is the difference between Proc Summary and Proc Means ?
Since stone age, people are being asked this worthless question while there are several more important things to ask.
Let's first look into these procedures one by one and then would delve into contrast:
Import this data to use:
Data Check;
Input Name $ Subject : $10. full_marks Marks_secured;
cards;
Rajat Maths 100 52
Vinod Maths 100 64
Rakesh Maths 100 57
Geeta Maths 100 58
Bharat Maths 100 86
Rajesh Maths 100 81
Monal Maths 100 67
Sonali Maths 100 88
Ritesh Maths 100 68
Neha Maths 100 88
Rajat Physics 150 53
Vinod Physics 150 69
Rakesh Physics 150 52
Geeta Physics 150 59
Bharat Physics 150 88
Rajesh Physics 150 59
Monal Physics 150 71
Sonali Physics 150 66
Ritesh Physics 150 68
Neha Physics 150 77
;
Run;
Proc sort data = check; by subject ;run;
Proc Summary data = check;
by subject ;
var marks_secured ;
output out = summary_of_check ;
Run;
by subject ;
var marks_secured ;
output out = summary_of_check
sum =
mean=
std=
n =
nmiss=
var=
max=
min=
median = /autoname
;Run;
You can have less options to get specific statistics ( use only min = and max = to get minimum and maximum).
Both the codes give same result except the column names being auto-named using autoname option in the second code.
In the above codes we have used "BY" statement for using a categorical variable for having group wise summary. In Proc Summary we have an options of using "CLASS" statement in place of "BY".
Usage and Output is little different, but at a macro level same.
Proc Summary data = check;
Class subject name ;
var marks_secured;
output out = summary_of_check
sum =
mean=
/autoname;
Run;
Suppose we use two categorical variables in CLASS statement, it would analyze at all interaction levels unlike BY. Output with BY statement is restricted to level you define.
Try this and see the difference (By statement would give green highlighted portion, but do try yourself ... I can be wrong )
Proc sort data = check; By subject name ; run;
Proc Summary data = check;
By subject name ;
var marks_secured;
output out = summary_of_check
sum =
mean=
/autoname;
Run;
Just replace SUMMARY with MEANS in above codes and you would find the same result ... So what is the difference ???
1. Proc Means gives a Print in the output window automatically. This printing can be stopped by using “Noprint” option.
2. Output out statement is not mandatory in Proc Means (because of point 1) . Proc Summary needs either an “Output out” statement or “Print” option (again refer to point 1 for logical deduction).
3. Even most basic code, as mentioned below, in Proc Means gives few statistics by default. Proc Summary, on the other hand, gives only number of observations.
Try :
Proc summary data = check print; run;
Proc means data = check ; run;
I personally don't give much importance to these differences, but covering these, as it would be useful for you to crack interviews ... where "super dudes" would ask you such redundant questions.
Enjoy reading our other articles and stay tuned with ...
Kindly do provide your feedback in the 'Comments' Section and share as much as possible.
What is the difference between Proc Summary and Proc Means ?
Since stone age, people are being asked this worthless question while there are several more important things to ask.
Let's first look into these procedures one by one and then would delve into contrast:
Import this data to use:
Data Check;
Input Name $ Subject : $10. full_marks Marks_secured;
cards;
Rajat Maths 100 52
Vinod Maths 100 64
Rakesh Maths 100 57
Geeta Maths 100 58
Bharat Maths 100 86
Rajesh Maths 100 81
Monal Maths 100 67
Sonali Maths 100 88
Ritesh Maths 100 68
Neha Maths 100 88
Rajat Physics 150 53
Vinod Physics 150 69
Rakesh Physics 150 52
Geeta Physics 150 59
Bharat Physics 150 88
Rajesh Physics 150 59
Monal Physics 150 71
Sonali Physics 150 66
Ritesh Physics 150 68
Neha Physics 150 77
;
Run;
Let's first run the Proc Summary on the data :
Proc sort data = check; by subject ;run;
Proc Summary data = check;
by subject ;
var marks_secured ;
output out = summary_of_check ;
Run;
The proc would summarize the data and gives following output :
It gives N (number of observations), Minimum, Maximum, Mean, Standard Deviation by default.
Now for having output in more usable or customized format try these codes :
Proc sort data = check; by subject ;run;
Proc Summary data = check;
by subject ;
var marks_secured ;
output out = summary_of_check
sum = a
mean= b
std= c
n = d
nmiss= e
var= f
max= g
min= h
median = i
;Run;
or
Proc sort data = check; by subject ;run;
Proc Summary data = check;by subject ;
var marks_secured ;
output out = summary_of_check
sum =
mean=
std=
n =
nmiss=
var=
max=
min=
median = /autoname
;Run;
You can have less options to get specific statistics ( use only min = and max = to get minimum and maximum).
Both the codes give same result except the column names being auto-named using autoname option in the second code.
In the above codes we have used "BY" statement for using a categorical variable for having group wise summary. In Proc Summary we have an options of using "CLASS" statement in place of "BY".
Usage and Output is little different, but at a macro level same.
You need not to sort the data in case you are using CLASS statement.
Proc Summary data = check;
Class subject name ;
var marks_secured;
output out = summary_of_check
sum =
mean=
/autoname;
Run;
Try this and see the difference (By statement would give green highlighted portion, but do try yourself ... I can be wrong )
Proc sort data = check; By subject name ; run;
Proc Summary data = check;
By subject name ;
var marks_secured;
output out = summary_of_check
sum =
mean=
/autoname;
Run;
Let's now run the Proc means on the data :
Neither I have much time to waste, nor I think I should waste yours ... coz, Time is Money !Just replace SUMMARY with MEANS in above codes and you would find the same result ... So what is the difference ???
1. Proc Means gives a Print in the output window automatically. This printing can be stopped by using “Noprint” option.
2. Output out statement is not mandatory in Proc Means (because of point 1) . Proc Summary needs either an “Output out” statement or “Print” option (again refer to point 1 for logical deduction).
3. Even most basic code, as mentioned below, in Proc Means gives few statistics by default. Proc Summary, on the other hand, gives only number of observations.
Try :
Proc summary data = check print; run;
Proc means data = check ; run;
Enjoy reading our other articles and stay tuned with ...
Kindly do provide your feedback in the 'Comments' Section and share as much as possible.
No comments:
Post a Comment
Do provide us your feedback, it would help us serve your better.