SAS Functions - Part 1 - Character Functions

Let's learn SAS Functions

Functions are customized bullets used to kill an identified target (to achieve a particular derivation). We can classify all the functions in SAS into three groups :
  • Character Functions
  • Numeric Functions
  • Date Functions
Lets learn these functions group wise :

Character Functions :

1. Measure the length of text


LENGTH function returns the length of a character value



 LENGTHN function is identical to LENGTH except it returns a length of 0 (instead of a length of 1) for a character missing Value

LENGTHC, returns the storage length of a string

Data name;   
input NAME $20.; 
cards; 
Radhika sharma
Rajaram pandit
SuReSh 
AaRtI
;
RUN;   

Data length;
set name ;
Len=length(name);
Len_N=lengthn(name);
Len_C=lengthc(name);
Run;

2. Change the case of Text

UPCASE function converts all letters to uppercase.

PROPCASE capitalizes the first letter of every “Word” and converts the remaining letters to lowercase.

LOWCASE function converts all letters to lowercase.

Data length;
set  name;
U_name = upcase(name);
L_name = lowcase(name);
P_name  = propcase(name);
Run;

3. Compress (Remove) specific parts of Text 

COMPRESS: It removes blanks(default action) or whatever you specify from a character value. In this example we are compressing ‘-’.

COMPBL : It converts multiple blanks to a single blank in between string.

Data compress_example;   
input NAME $20.;  
cards; 
RAD-HIKA   SHARMA
RAJARAM  PAND-IT
SURESH123 
AA-RT-I456
Run; 

/*try this code and you know how this works */

Data Try;  
Set   compress_example;  
NAME1=COMPRESS(NAME,'-');
NAME2=COMPRESS(NAME,'-A');
NAME3=COMPRESS(NAME,'AR-');
NAME4=COMPBL(NAME); 
Run;               

Mutation imparts Super power to Compress Function
There are options that make Compress function super cool. 

a  --- Removes all alphabets ( a for alphabets)
d  --- Removes all digits ( d for digits)
p  --- Removes all punctuation ( p for punctuation)


However if we add "k" with a, p , d, it reverse the logic, Now SAS starting keeping it instead of removing.

ka  --- Keeps all alphabets ( a for alphabets)
kd  --- Keeps all digits ( d for digits)
kp  --- Keeps all punctuation ( p for punctuation)

Data Try;  
Set   compress_example;  
NAME_1 = COMPRESS(NAME,'','a');
NAME_2 = COMPRESS(NAME,'','p');
NAME_3 = COMPRESS(NAME,'','d');
NAME_4 = COMPRESS(NAME,'','ka');
NAME_5 = COMPRESS(NAME,'','kp');
NAME_6 = COMPRESS(NAME,'','kd');
Run; 

4. Adding strings - ||, Cat, Cats, Catx

We can't mathematically add the strings, we can simply concatenate these. We can do the same in multiple ways.

data sample ;
input first $ last $ ;
cards ;
New York
Las Vegas
New Jersey
Los Angeles
Vatican City
;
run;

Data concat ;
Set Sample ;
Full_name_1 = first || last;
Full_name_2 = Cat(first , last) ;
Full_name_3 = Cats(first , last) ;
Full_name_4 = Catx("$", first , last) ;
Run;

Check the result and try to contrast among results.

All ||, Cat, Cats, Catx are used to concatenate, but in slightly different manner.

|| and Cat joins keeping maintaining gap in between two strings, however there is a difference; The length of resultant field is 200 be default in case of Cat ( for that matter cats and Catx too), however in case of ||, it is summation of length of initial variable ( which are being concatenated).

Cats concatenates without any gap between strings. It also strips off leading and trailing blanks before joining the strings

Catx is most advanced among these. It demands the delimiter that you want to give in between the strings.

Use these as per your requirement ... 


5. Reverse -- for mirror image

Just try ... and you will know about it ... 

data sample ;
input name $ ;
cards ;
india
football
cricket
;
run;

Data reverse;                     

set sample;
rev_name = reverse(name); 
RUN; 

It simply inverts the order of letters in string.


6. Removing Blanks - Left, Trim and Strip

LEFT  function is for removing leading blanks (blank before string ... on left side)

TRIM  is for removing trailing blanks (blank after string ... on right side)

STRIP function is for removing leading and trailing blanks both ... I just love this one.

Data try ;
first='  Strip   ';
middle='  is    ';
last= '  best     '  ;
run;

Data try;

set  try;
concat_1 = first || middle || last;
Name_1 = Left (concat_1);
Name_2 = trim (concat_1);
Name_3 = Strip (concat_1);
Run;

7. String Substitution - Word by word

We can use tranwrd to replace word with another word ...

Syntax  -- Tranwrd(Variable , < Replace it >,< Replace with it >)

data Substitute;         
input  NAME  $20.;
CARDS;     
MISS.SIMRAN 
MISS.SOUNDARYA 
MISS.AARTI
MIS.SANDHYA
MISSS.ROOPAL
MISS.MISSISIPI

RUN;

Data Substitute;

set Substitute;
NEW=TRANWRD(NAME,'MISS','MS');
Run;



Please be very careful while using this function, as it is an unbridled horse. Control it.
See, how it changes the name of the last girl ... while we intended to change only salutation part.

8. String Substitution - Letter by letter

We can use this function for letter to letter replacement.

Syntax  -- Translate(Variable , < Replace with these >,< Replace these >)

data translate;
input Answer : $5.;

Answer =translate(Answer,'ABCDE','12345');


Datalines;

14325
AB123
51492
;
Run;
Proc print ;

run;

All the numbers 12345 would be replaced with characters ABCDE respectively.

A very strong utility that I can think of this function is encryption of message. So next time you send a message to your girlfriend, you can encrypt that using this function. In a separate message, you can send a decryption code to her.

9. String Extraction - Substr- on the basis of absolute position

We can use this function for extract a part of sting on the basis of absolute position. Its surrogate function in excel if mid. 

Syntax  -- Substr(Variable , < start from here>,< these many letters >)

data extract;
input ID : $10. ;
datalines;
NJ12M99
NY76F4512
TX91M5
;
Run;
Data Extract;
set  Extract;
length State $ 2. Gender $1. ;
State = substr(ID,1,2);
Gender = substr(ID,5,1);

Run;

10. String Extraction - Scan - on the basis of delimiter

We can use this function for extract a part of sting based on position with respect to delimiter.

Syntax  -- Scan(Variable , < Position Number >,< Delimiter >)

data sample ;
input City $20.  ;
cards ;
New,York
Las,Vegas
New,Jersey
Los,Angeles
Vatican,City
;
run;

Data part;

set  sample;
part_1 = scan(city, 1, ",");
part_2 = scan(city, 2, ",");

Run;

There are many modifiers too that can be used with the function, but we are not explaining those here.
Important note : even if you don't provide it the third argument i.e. delimiter, it automatically considers following as default delimiters.

blank ! $ % & ( ) * + , - . / ; < ^  |


That's enough in Character Functions, next two articles in this series would cover numeric and date & time functions.

For more advanced Character(Text) functions >> HIT ME 


Enjoy reading our other articles and stay tuned with ...

Kindly do provide your feedback in the 'Comments' Section and share as much as possible.





No comments:

Post a Comment

Do provide us your feedback, it would help us serve your better.