Hidden Gems of SAS - 2
Supercalifragilisticexpialidocious, the is the largest possible adjective I could find while I got to learn about this hidden gem of SAS : PROC SPELL. Learn it and believe me if you ever need to work in text mining, it would make your life so easy.The pic is no exaggeration for this SAS procedure!
Let me share one of my past experience to set a context.
The story is little boring, but it is worth listening to:
Once upon a time ... in one of my previous organizations, I got to work in a Text Mining project where we were dealing the worst possible text ever. The text data was ]about all possible commodities/things on earth and was full of all possible typos /spelling variations.
Example: For the word "Industry", these were all possible correct and incorrect variants such as: "Industries", "Industrys" "industri", "indastry", "industree " ...... so on and we were suppose to do spelling correction first for all the words and bring them to basic word "Industry".
I would not like to steal the credit as there used to be an excellent SAS programmer in the team (I never had a chance to meet with her), who made a 500+ lines algorithm that used to accomplish the herculean task.
Basically, the algorithm was parsing all the text into words and then a huge cartesian product of the words' list was being prepared. Then all the similar words were being identified on the basis of "Spelling Distance" and "Phonetic Similarity". The root word was being identified on the basis of mode i.e. maximum occurring word was being considered as ROOT.
For learning "Spelling Distance", read:
Spelling distance based matching (Spedis, Compged and complev functions)
For learning "Phonetic Similarity, read:
Sound based matching (Soundex)
Though the logic didn't fail in most of our test cases, but there was a flaw in the algorithm : What if the maximum occurring word itself is mis-spelled? Hence the idea was not full proof (No offense to any one, and I mean it. I really pay my best respect to the person who wrote that code). Also the execution of the code used to take lots of time for a large data.
I am not blaming anyone, neither I am saying that the algorithm is useless, in fact the same would be required, even when you use Proc Spell. The only thing that I want to emphasize here is that most of us are not aware of this beautiful and powerful package : Proc Spell ... and idea that I want to covey is that the algorithm (referred above) can be improved with the help of this package.
A great person has once said :
Let's see how the Proc Spell works :
First create a misspelled words.txt file with following content:Industries understand special traininng needs
Industry understand special training needs
Industrys understund special traininng needs
industri understand spesial training needs
indastry usderstand special training needs
industre undarstund special trainng needs
You should not trast anyone blindly
/* Let's now import the file into SAS. */
%let location = G:\AA\SAS gems;
libname AA "&location.";
filename sample "&location.\misspelled words.txt";
/* In the first step, we try to create a catalogue of words in the file */
Proc SPELL words = sample
Create dict = AA.mycatgalog.Spell;
Run;
/* Now initiate a file for accommodating required output */
/* Now with the help of Proc Spell, we try to identify the misspelled words and seek suggestion to correct those, and take output in the above initialized file */
dictionary = AA.mycatgalog.Spell
verify suggest;
run;
Proc Printto print = print; Run;
/* Open the output file to understand the output of Proc Spell, let's get the output back into SAS */

Data AA.List_correction;
infile "&location.\output.txt" missover firstobs = 7 ;
input A & $1000. ;
Run;
/* Looks like */
/* Transform the output file into readily usable form */
Data AA.List_correction;
set AA.List_correction;
retain id 1;
if A = "" then id +1;
Run;
Proc transpose data = AA.List_correction out = aa.transposed;
by id;
var A;
where A ~="";
Run;
data aa.transposed;
length suggested $1000.;
retain id original_word suggested;
set aa.transposed (drop = _name_);
rename Col1 = original_word;
suggested = scan(Col2,2,":");

Run;
Data aa.transposed;
retain id original_word suggested;
set aa.transposed;
run;
... and here we are with a list of incorrect words with suggested correction. For few words, we might not get any and for others, we might get more than one, Now to it is time to build the further algorithm to replace wrong word with the most appropriate corrected word. You can build a macro and use tranwrd function to replace the word.
Enjoy reading our other articles and stay tuned with us.
Kindly do provide your feedback in the 'Comments' Section and share as much as possible.
It is amazing and wonderful to visit your site.Thanks for sharing this information,this is useful to me...
ReplyDeletehttp://chennaitraining.in/base-sas-training-in-chennai/
http://chennaitraining.in/abinitio-training-in-chennai/
http://chennaitraining.in/datastage-training-in-chennai/
http://chennaitraining.in/cognos-training-in-chennai/
http://chennaitraining.in/cognos-tm1-training-in-chennai/
http://chennaitraining.in/microstrategy-training-in-chennai/
http://chennaitraining.in/qlikview-training-in-chennai/
Great Article Artificial Intelligence Projects
ReplyDeleteProject Center in Chennai
JavaScript Training in Chennai
JavaScript Training in Chennai
youtube abone satın al /n trendyol indirim kodu
ReplyDeletecami avizesi
cami avizeleri
avize cami
no deposit bonus forex 2021
takipçi satın al
takipçi satın al
takipçi satın al
takipcialdim.com/tiktok-takipci-satin-al/
instagram beğeni satın al
instagram beğeni satın al
btcturk
tiktok izlenme satın al
sms onay
youtube izlenme satın al
no deposit bonus forex 2021
tiktok jeton hilesi
tiktok beğeni satın al
binance
takipçi satın al
uc satın al
sms onay
sms onay
tiktok takipçi satın al
tiktok beğeni satın al
twitter takipçi satın al
trend topic satın al
youtube abone satın al
instagram beğeni satın al
tiktok beğeni satın al
twitter takipçi satın al
trend topic satın al
youtube abone satın al
takipcialdim.com/instagram-begeni-satin-al/
perde modelleri
instagram takipçi satın al
instagram takipçi satın al
takipçi satın al
instagram takipçi satın al
betboo
marsbahis
sultanbet
pond coin hangi borsada
ReplyDeleteslp coin hangi borsada
enjin coin hangi borsada
mina coin hangi borsada
sngls coin hangi borsada
win coin hangi borsada
shiba coin hangi borsada
is binance safe
is binance safe
thanks admin HDE Bilişim
ReplyDeleteAlışveriş
Compo Expert
Multitek
Seokoloji
Vezir Sosyal Medya
Adak
Maltepe Adak
The Evolution of the Casino, the City and the Wild West - Dr.
ReplyDelete› the-casinow-and-wild-west- › the-casinow-and-wild-west- Nov 23, 2017 — Nov 23, 2017 The evolution of the Casino, the City and 경주 출장안마 the Wild West is here at the West Coast, The evolution of 제천 출장샵 the Casino, the City and the 평택 출장마사지 Wild West 여주 출장안마 is here at the West Coast, The evolution of the Casino, the City and the Wild West is 광주광역 출장마사지 here at the West Coast,
seo fiyatları
ReplyDeletesaç ekimi
dedektör
instagram takipçi satın al
ankara evden eve nakliyat
fantezi iç giyim
sosyal medya yönetimi
mobil ödeme bozdurma
kripto para nasıl alınır
instagram beğeni satın al
ReplyDeleteyurtdışı kargo
seo fiyatları
saç ekimi
dedektör
fantazi iç giyim
sosyal medya yönetimi
farmasi üyelik
mobil ödeme bozdurma
bitcoin nasıl alınır
ReplyDeletetiktok jeton hilesi
youtube abone satın al
gate io güvenilir mi
referans kimliği nedir
tiktok takipçi satın al
bitcoin nasıl alınır
mobil ödeme bozdurma
mobil ödeme bozdurma
perde modelleri
ReplyDeleteSms Onay
MOBİL ÖDEME BOZDURMA
nft nasıl alınır
ankara evden eve nakliyat
TRAFİK SİGORTASI
Dedektör
HTTPS://KURMA.WEBSİTE/
ask romanlari
Eminent . Kindly continue to compose more on this subject . I need more material on this point. What is the Kenya visa cost for US citizens ? The visa expenses for Kenya are no different for all nations . It is just impacted by the sort of e visa which one you select. .
ReplyDeletesmm panel
ReplyDeleteSMM PANEL
iş ilanları
İnstagram takipçi satın al
hirdavatciburada.com
beyazesyateknikservisi.com.tr
servis
tiktok jeton hilesi
Amazingly unimaginable really, these blogs are very attractive. How to apply e visa India? Apply online , pay online and get your visa online in your updated email. Id.
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteGood content. You write beautiful things.
ReplyDeletevbet
taksi
sportsbet
mrbahis
vbet
sportsbet
korsan taksi
mrbahis
hacklink
This post is on your page i will follow your new content.
ReplyDeletemrbahis
sportsbet giriş
sportsbet
betgaranti.online
sportsbetgiris.net
casino siteleri
mrbahis giriş
mrbahis.co
sportsbet
Toptan vozol için buraya tıklayın: toptan vozol
ReplyDelete