How to modify data with IF Else conditions


R Tutorial 4.0


It is very important to know how to apply logic while learning an programming. This article  covers "How to use of IF-Else conditions in R for performing logical operations". We will see how it is used  for making new variables and imputing missing values etc.




Let's consider an inbuilt SAS dataset for leaning IF-Else based logical operations. :

# We make a copy of inbuilt R data cars having two columns only speed and dist

cars_data = cars

#  We will now work on copy dataset. Let's now create a new variable : pickup

cars_data$pickup =  cars_data$speed/ cars_data$dist

#or we can first attach the dataset in temporary memory

attach(cars_data)
cars_data$pickup =  speed / dist

Let now learn the basic method of logical IF operation


Syntax :   ifelse(condition, "What if true", "What if false")


#Simplest ifelse :  Will create a column with two values

cars_data$category = ifelse(pickup>1,"High","Low")

#Simplest nested ifelse : Will create a column with three values

cars_data$category = ifelse(pickup>1 ,"High",ifelse(pickup>0.5,"Medium","Low"))

A little Complex If-Else-IF operation using another inbuilt R dataset : iris


# Let's first create Data_1 as a copy of iris 

Data_1 = iris
attach(Data_1)

# Microsoft Excel Style coding for nested IFs : Will create a column with 6 variants


Data_1$category = ifelse(Species == "setosa" & Sepal.Length <5,"C-1",
                                ifelse(Species == "setosa" & Sepal.Length >= 5,"C-2",
                                  ifelse(Species == "versicolor" & Sepal.Length < 6,"C-3",
                                     ifelse(Species == "versicolor" & Sepal.Length >= 6,"C-4",
                                            ifelse(Species == "virginica" & Sepal.Length < 7,"C-5",
                                             ifelse(Species == "virginica" & Sepal.Length >= 7,"C-6","Other"))))))


  


How to use the ifelse method for missing value imputation

# Let's first define a vector
xyz = c(11,12,NA,13,15)

#  To identify the position of missing value
which(is.na(xyz))

#  Summary function works even when the missing values are there 
summary(xyz)

#  mean function, however, doesn't works when the missing values are there 
mean(xyz)

#  but to make it work, we can use a simple trick
mean(xyz,na.rm = T)

# Now, let impute the missing value with a hard coded value
abc = ifelse(is.na(xyz),0,xyz)

# Now, let impute the missing value with a mean of non-missing values
pqr = ifelse(is.na(xyz),mean(xyz,na.rm = T),xyz)



Enjoy reading our other articles and stay tuned with us.

Kindly do provide your feedback in the 'Comments' Section and share as much as possible.

No comments:

Post a Comment

Do provide us your feedback, it would help us serve your better.