Let's learn SAS Functions
Functions are customized bullets used to kill an identified target (to achieve a particular derivation). We can classify all the functions in SAS into three groups :
- Character Functions
- Numeric Functions
- Date Functions
Lets learn these functions group wise :
1. Measure the length of text
LENGTH function returns the length of a character value
LENGTHN function is identical to LENGTH except it returns a length of 0 (instead of a length of 1) for a character missing Value
LENGTHC, returns the storage length of a string
Data name;
input NAME $20.;
cards;
Radhika sharma
Rajaram pandit
SuReSh
AaRtI
;
RUN;
Data length;
set name ;
Len=length(name);
Len_N=lengthn(name);
Len_C=lengthc(name);
Run;
2. Change the case of Text
UPCASE function converts all letters to uppercase.
PROPCASE capitalizes the first letter of every “Word” and converts the remaining letters to lowercase.
LOWCASE function converts all letters to lowercase.
Data length;
set name;
U_name = upcase(name);
L_name = lowcase(name);
P_name = propcase(name);
Run;
3. Compress (Remove) specific parts of Text
COMPRESS: It removes blanks(default action) or whatever you specify from a character value. In this example we are compressing ‘-’.
COMPBL : It converts multiple blanks to a single blank in between string.
Data compress_example;
input NAME $20.;
cards;
RAD-HIKA SHARMA
RAJARAM PAND-IT
SURESH123
AA-RT-I456
;
Run;
/*try this code and you know how this works */
Data Try;
Set compress_example;
NAME1=COMPRESS(NAME,'-');
NAME2=COMPRESS(NAME,'-A');
NAME3=COMPRESS(NAME,'AR-');
NAME2=COMPRESS(NAME,'-A');
NAME3=COMPRESS(NAME,'AR-');
NAME4=COMPBL(NAME);
Run;
Mutation imparts Super power to Compress Function
There are options that make Compress function super cool.
a --- Removes all alphabets ( a for alphabets)
d --- Removes all digits ( d for digits)
p --- Removes all punctuation ( p for punctuation)
However if we add "k" with a, p , d, it reverse the logic, Now SAS starting keeping it instead of removing.
ka --- Keeps all alphabets ( a for alphabets)
kd --- Keeps all digits ( d for digits)
kp --- Keeps all punctuation ( p for punctuation)
Data Try;
Set compress_example;
NAME_1 = COMPRESS(NAME,'','a');
NAME_2 = COMPRESS(NAME,'','p');
NAME_3 = COMPRESS(NAME,'','d');
NAME_4 = COMPRESS(NAME,'','ka');
NAME_5 = COMPRESS(NAME,'','kp');
NAME_6 = COMPRESS(NAME,'','kd');
Run;
Set compress_example;
NAME_1 = COMPRESS(NAME,'','a');
NAME_2 = COMPRESS(NAME,'','p');
NAME_3 = COMPRESS(NAME,'','d');
NAME_4 = COMPRESS(NAME,'','ka');
NAME_5 = COMPRESS(NAME,'','kp');
NAME_6 = COMPRESS(NAME,'','kd');
Run;
4. Adding strings - ||, Cat, Cats, Catx
We can't mathematically add the strings, we can simply concatenate these. We can do the same in multiple ways.
input first $ last $ ;
cards ;
New York
Las Vegas
New Jersey
Los Angeles
Vatican City
;
run;
Set Sample ;
Full_name_1 = first || last;
Full_name_2 = Cat(first , last) ;
Full_name_3 = Cats(first , last) ;
Full_name_4 = Catx("$", first , last) ;
Run;
Check the result and try to contrast among results.
All ||, Cat, Cats, Catx are used to concatenate, but in slightly different manner.
|| and Cat joins keeping maintaining gap in between two strings, however there is a difference; The length of resultant field is 200 be default in case of Cat ( for that matter cats and Catx too), however in case of ||, it is summation of length of initial variable ( which are being concatenated).
Cats concatenates without any gap between strings. It also strips off leading and trailing blanks before joining the strings
Catx is most advanced among these. It demands the delimiter that you want to give in between the strings.
Use these as per your requirement ...
5. Reverse -- for mirror image
Just try ... and you will know about it ...
input name $ ;
cards ;
india
football
cricket
;
run;
Data reverse;
set sample;
rev_name = reverse(name);
RUN;
It simply inverts the order of letters in string.
6. Removing Blanks - Left, Trim and Strip
LEFT function is for removing leading blanks (blank before string ... on left side)
TRIM is for removing trailing blanks (blank after string ... on right side)
STRIP function is for removing leading and trailing blanks both ... I just love this one.
first=' Strip ';
middle=' is ';
last= ' best ' ;
run;
Data try;
set try;
concat_1 = first || middle || last;
Name_1 = Left (concat_1);
Name_2 = trim (concat_1);
Name_3 = Strip (concat_1);
Run;
7. String Substitution - Word by word
We can use tranwrd to replace word with another word ...
Syntax -- Tranwrd(Variable , < Replace it >,< Replace with it >)
data Substitute;
input NAME $20.;
CARDS;
MISS.SIMRAN
MISS.SOUNDARYA
MISS.AARTI
MIS.SANDHYA
MISSS.ROOPAL
MISS.MISSISIPI
;
RUN;
Data Substitute;
set Substitute;
NEW=TRANWRD(NAME,'MISS','MS');
Run;
Please be very careful while using this function, as it is an unbridled horse. Control it.
See, how it changes the name of the last girl ... while we intended to change only salutation part.
8. String Substitution - Letter by letter
We can use this function for letter to letter replacement.
Syntax -- Translate(Variable , < Replace with these >,< Replace these >)
input Answer : $5.;
Answer =translate(Answer,'ABCDE','12345');
Datalines;
14325
AB123
51492
;
Run;
Proc print ;
run;
All the numbers 12345 would be replaced with characters ABCDE respectively.
A very strong utility that I can think of this function is encryption of message. So next time you send a message to your girlfriend, you can encrypt that using this function. In a separate message, you can send a decryption code to her.
9. String Extraction - Substr- on the basis of absolute position
We can use this function for extract a part of sting on the basis of absolute position. Its surrogate function in excel if mid.
Syntax -- Substr(Variable , < start from here>,< these many letters >)
input ID : $10. ;
datalines;
NJ12M99
NY76F4512
TX91M5
;
Run;
Data Extract;
set Extract;
length State $ 2. Gender $1. ;
State = substr(ID,1,2);
Gender = substr(ID,5,1);
Run;
10. String Extraction - Scan - on the basis of delimiter
We can use this function for extract a part of sting based on position with respect to delimiter.
Syntax -- Scan(Variable , < Position Number >,< Delimiter >)
input City $20. ;
cards ;
New,York
Las,Vegas
New,Jersey
Los,Angeles
Vatican,City
;
run;
Data part;
set sample;
part_1 = scan(city, 1, ",");
part_2 = scan(city, 2, ",");
Run;
There are many modifiers too that can be used with the function, but we are not explaining those here.
Important note : even if you don't provide it the third argument i.e. delimiter, it automatically considers following as default delimiters.
blank ! $ % & ( ) * + , - . / ; < ^ |
That's enough in Character Functions, next two articles in this series would cover numeric and date & time functions.
Enjoy reading our other articles and stay tuned with ...
Kindly do provide your feedback in the 'Comments' Section and share as much as possible.
No comments:
Post a Comment
Do provide us your feedback, it would help us serve your better.