What's being imported in India?
For long many Ask Analytics students have asked to explain a real time case study and here we are with a very interesting one.
While the whole world is a after Twitter and Kaggle data, we are using an indigenous data, quite rich daily import and export statistics released for public by Government of India on one of its websites. Consider an objective of learning import profile of India.
We would learn several new thing during the course of project.
Step 1 : Download the data
Visit https://www.icegate.gov.in/DailyList/DL and you will land on the following page.
Click to enlarge |
Click to enlarge |
For Export stats: Select Shipping ( The document Imports files for declaring objected imported from foreign land)
In Location, please select All for selected all type of ports ( Air, Sea and others as well).
and last we select a particular date.
Once we click the submit button, we land on the next page.
And then we finally download the data.
We get a zipped file from which we can extract all the flat (.txt files) into a folder and we get 100+ files with data.
Step 2: Import all these files into SAS
There are 100+ files and I don't have time as well as patience to copy or type the name of these file so what should I do.
Let's learn how to tackle such situation using DIRLIST PIPE option.
I have extracted all the files in a folder "G:\SAS Project 1", you can choose your own.
Filename DIRLIST PIPE 'dir "G:\AA\using pipe" ';
Data datalist ;
Infile DIRLIST lrecl=200 truncover;
Input line $200.;
Run;
/* Above steps can be used as such, you just need to change the directory address */
/* The information about data files thus stored needs some cleaning for making it readily usable */
Data datalist;
Set datalist;
Name =substr(line,40,100);
If anyalpha(name) = 0 then delete;
If name = "" then delete;
/*remove file extension*/
name = reverse(substr(strip(reverse(name)),5,100));
Run;
/*I would use this info dataset for importing all the files */
Data &file_name.;
infile "G:\SAS Project 1\&file_name..txt" dsd dlm = "|" missover lrecl= 32767 Firstobs = 5;
Input
Country_of_Origin : $200.
Desc : $1000.
Commodity_code : $10.
Quantity : Best32.
Unit : $10.
Value : Best32.
;
Run;
data &file_name.;
set &file_name.;
Port_cd = scan("&file_name.",2,"_");
date = scan("&file_name.",3,"_");
date_of_import = mdy(substr(date,3,2)*1,substr(date,1,2)*1,substr(date,5,4)*1);
drop date;
format date_of_import date9.;
run;
proc append base = final_data data = &file_name. force;Run;
proc delete data = &file_name.; run;
Data _null_;
Set datalist (obs = max);
Call execute ('%Import_all_at_once(file_name = ' || name || ');');
Run;
Now while data is here, let's try to analyze it.Feeling tired, need to have a break?
All right, we would write another article covering analysis of the data prepared.
Enjoy reading our other articles and stay tuned with us.
Kindly do provide your feedback in the 'Comments' Section and share as much as possible.
A humble appeal : Please do like us @ Facebook
No comments:
Post a Comment
Do provide us your feedback, it would help us serve your better.