How to import and export data in R


R Tutorial 2.0


In the series of R, here comes the second article.

The article covers : how to import an external data file in R and how to create an external data file from R-Dataset.

So are you ready ?

Please make sample .csv and .txt files and save it in a particular folder in your PC. You can paste the following data in your sample file.

Gender,Name,Claim_Amount
Male,Naveen,10000
Male,Mahesh,15000
Male,Rajiv,7500
Female,Neeta,18000
Female,Geeta,12000

Let's first learn importing of data :


There are multiple ways of importing data, we can choose the suitable method based on situation.

First we will import .csv file : 

Data_1 =  read.csv(file.choose(), header = TRUE)

Above command would prompt a window, where you can explore and choose a file to import. Select the file and you are done. header = TRUE option tells R that data file has first row as names of columns.

File.choose option needs manual intervention, you can also specify the location & name of file for automation perspective.

Data_2 =  read.csv("G:/KMS/Learning R/Working Folder/Raw Data/Sample.csv", header = TRUE)


Results are same.

Let's now import .txt file : 

Data_3 =  read.table(file.choose(), header = TRUE,sep = ",")

or

Data_4 =  read.table("G:/KMS/Learning R/Working Folder/Raw Data/Sample.txt", header = TRUE, sep = ",")

sep = "," defines "," is delimiter. Suppose, we have any other delimiter such as "tab", "~" "|" or anything else, we can specify the same using this options.


For a tab delimited file we can use : "\t"

Data_5 =  read.table(file.choose(), header = TRUE, sep = "\t")

or

Data_6 =  read.table("G:/KMS/Learning R/Working Folder/Raw Data/Sample.txt", header = TRUE, sep = "\t")


By now you must have realized that R reads the location in your PC with a "/" instead of normal "\". So you have to change the location manually while copying it from address bar in windows. You can  use either "\\" or "/".
  

Let's now import a complex data :


Suppose we have a data file - complex data.txt which is a "~" delimited file with no header and has following data:

1~AGKE~08/03/1999~$10.49
2~SBKE~12/18/2002~$11.00
3~SEEK~10/23/1995~$5.00
4~AGBN~08/03/1999~$11.49
5~KLPD~12/18/2002~$13.00
6~LOTR~10/23/1995~$15.00

Now let's see how such data can be imported and transformed into usable format.

Data_7 = read.table("G:/KMS/Learning R/Working Folder/Raw Data/Complex Data.txt", header = FALSE, sep = "~")

Data_7


Data is imported and looks  like : 


Let's now give proper names to columns and also give formats to columns as right now V3 and V4 are in character format.




# Use attach to directly refer to Data_7
attach(Data_7)

# First we create 4 individual vectors from columns and transform 
id = V1
name = as.character(V2)
datevar =   as.Date(as.character(V3), "%m/%d/%Y")
cost = as.numeric(substr(V4,2,100))



Data_7_transformed
# now we can create a data using vectors
Data_7_transformed= cbind(id,name,datevar,cost)


What's those values in datevar ... it's looking like numbers instead of dates !


Yes, that's how R stores the date. The number actually is the number of days from 1 Jan 1970 (base date of R. R considers this date as 0.)

To understand better, run this :
x =  as.numeric(as.Date(as.character("01/01/1970"), "%m/%d/%Y"))


If you want to show the data in date format itself, you can format it.

# First we create 4 individual vectors from columns and transform 
id = V1
name = as.character(V2)
datevar =  format(as.Date(as.character(V3), "%m/%d/%Y"),format='%m/%d/%Y')
cost = as.numeric(substr(V4,2,100))


Data_7_new

# now we can create a data using vectors
Data_7_new = cbind(id,name,datevar,cost)


Although column V3 and datevar of Data_7 and Data_7_new respectively looks same, but it is different. First one is character and second one is date format.





That's sufficient to learn in starting, with time, we would learn more about importing in depth. Let's proceed to Export portion now, which is much easier and not much time taking.




Exporting means creating an external file from a R data.

In order to export a data to pipe ( | ) delimited .txt file, we can use the following code:


write.table(Data_7_new, "G:/KMS/Learning R/Working Folder/Raw Data/file_name.txt", quote = TRUE, sep="|", row.name = FALSE)

or to export in a .csv format :


write.csv(Data_7_new, "G:/KMS/Learning R/Working Folder/Raw Data/file_name.csv", row.names = F)


Although there are other options too in the code, but we have used only important one.

"Quote" options enables us to keep the data qualified or non-qualified. 

"row.name = FALSE " doesn't give unnecessary row ID along with data.

-------------------------------------------------------------------------------------------------------------------
Please be very careful while working with dates in R.  For example while defining the format '%m/%d/%Y' and '%m/%d/%y'  are different.

 '%m/%d/%Y'  reads and writes the year in 4 digit format ( e.g. 2008,1998)

'%m/%d/%y'  reads and writes the year in 2 digit format ( e.g. 08,98)

We would cover more on it in another blog further.

-------------------------------------------------------------------------------------------------------------------

Enjoy reading our other articles and stay tuned with ...


Kindly do provide your feedback in the 'Comments' Section and share as much as possible.

No comments:

Post a Comment

Do provide us your feedback, it would help us serve your better.