SAS Project 1 - Part 1


What's being imported in India?


For long many Ask Analytics students have asked to explain a real time case study and here we are with a very interesting one.

While the whole world is a after Twitter and Kaggle data, we are using an indigenous data, quite rich daily import and export statistics released for public by Government of India on one of its websites. Consider an objective of learning import profile of India.

We would learn several new thing during the course of project.



Step 1 : Download the data

Visit https://www.icegate.gov.in/DailyList/DL and you will land on the following page.

Click to enlarge
There are three drop downs provided which you need to select options from (as illustrated in the picture above).

Click to enlarge
For Import stats: Select Bill of Entry ( The document Imports files for declaring objected imported from foreign land)
For Export stats: Select Shipping ( The document Imports files for declaring objected imported from foreign land)

In Location, please select All for selected all type of ports ( Air, Sea and others as well).

and last we select a particular date.

Once we click the submit button, we land on the next page.

 And then we finally download the data.

We get a zipped file from which we can extract all the flat (.txt files)  into a folder and we get 100+ files with data.


Step 2: Import all these files into SAS

There are 100+ files and I don't have time as well as patience to copy or type the name of these file so what should I do.

Let's learn how to tackle such situation using DIRLIST PIPE  option.

I have extracted all the files in a folder "G:\SAS Project 1", you can choose your own.

Filename DIRLIST PIPE  'dir "G:\AA\using pipe" ';    

Data datalist ;   
Infile DIRLIST lrecl=200 truncover;                          
Input line $200.; 
Run;

/* Above steps can be used as such, you just need to change the directory address */

/* The information about data files thus stored needs some cleaning for making it readily usable  */

Data datalist;
Set datalist;
Name =substr(line,40,100);
If anyalpha(name) = 0 then delete;
If name = "" then delete;
/*remove file extension*/

name = reverse(substr(strip(reverse(name)),5,100));
Run;

/*I would use this info dataset for importing all the files  */

%Macro Import_all_at_once (file_name);
Data &file_name.;
infile "G:\SAS Project 1\&file_name..txt" dsd dlm = "|" missover lrecl= 32767 Firstobs = 5;
Input 
Country_of_Origin : $200.
Desc : $1000.
Commodity_code : $10.
Quantity : Best32.
Unit : $10.
Value : Best32.
;
Run;

data &file_name.;
set  &file_name.;
Port_cd = scan("&file_name.",2,"_");
date = scan("&file_name.",3,"_");
date_of_import = mdy(substr(date,3,2)*1,substr(date,1,2)*1,substr(date,5,4)*1);
drop date;
format date_of_import date9.;
run;
proc append  base =  final_data  data = &file_name. force;Run;

proc delete data = &file_name.; run;

%Mend;


Data _null_;
Set datalist (obs = max);
Call execute ('%Import_all_at_once(file_name = ' || name || ');');
Run;

The code can be used for collating data for N number of days as well ... So FIRST LESSON learned well.

 Now while data is here, let's try to analyze it.Feeling tired, need to have a break?

All right, we would write another article covering analysis of the data prepared.


Enjoy reading our other articles and stay tuned with us.

Kindly do provide your feedback in the 'Comments' Section and share as much as possible.



A humble appeal :  Please do like us @ Facebook



No comments:

Post a Comment

Do provide us your feedback, it would help us serve your better.