Ask Analytics: Text Mining in R

Sentiment Analysis in R - Second Method

In the previous article, we have elaborated basics about Sentiment Analysis and most rudimentary method for performing the same. We also discussed about the drawbacks of the basic method. In this article, we would learn one of the evolved methods of Sentiment Analysis with which we would try to overcome few of the drawbacks of the previous method.
We advise to go through the previous article on Sentiment Analysis in order to understand this article better.

Text Mining in R - Part 5

We are using the same file for understanding this method too. Please download if you haven't done yet :

:: Please download the file ::

# Let's first read the csv file.

rm(list = ls())
setwd("G:\\AA\\Text Mining")
data_1 = read.csv("movies_reviews.csv")

# Let do first round of cleaning using NAMO function as explained the previous blog, here we have done minor changes just for experimental purpose only.

NAMO = function(x)
{
y = tolower(x)
y = gsub("@\\w+", "", y)
y = gsub("[[:punct:]]", " ", y)
y = gsub("http", "", y)
y = gsub("www", "", y)
y = gsub("\\d+", "", y)
y = gsub("[^\x20-\x7E]", "", y)
y = gsub("^\\s+|\\s+$", "", y)
return(y)
}
data_1$clean = NAMO(data_1$Review)

# Here comes second round of cleaning in which we are mainly removing redundant English words
if(!require(tm)) install.packages("tm")
require(tm)
textCorpus = Corpus(VectorSource(data_1$clean ))
textCorpus = tm_map(textCorpus, removeWords, stopwords("english"))
textCorpus = tm_map(textCorpus, stripWhitespace)
x = as.data.frame(textCorpus)
clean_review = x$text

# cleaned reviews are being merged back to data
data_2 = cbind(data_1[,c(1,2)], clean_review)
rm(data_1, x, clean_review, textCorpus)

# Now let's use the powerful package qdap. Well, even in the last method we have used the same to fetch the list of positive and negative words, but here we are using it completely.

if(!require(qdap)) install.packages("qdap")
require(qdap)
senti_score = polarity(data_2$clean_review,
polarity.frame = qdapDictionaries::key.pol,constrain = FALSE,
negators = qdapDictionaries::negation.words,
amplifiers = qdapDictionaries::amplification.words,
deamplifiers = qdapDictionaries::deamplification.words)

score = senti_score$all[,3]
data_2 =cbind(data_2,score)

Basically the package is using inbuilt dictionaries to score the reviews on the sentiment scale. The packages not only finds out the positive and negative keywords, but also checks the negation, amplification and de-amplification key words as shown below to finally arrive on a conclusive sentiment score. Hence we overcome almost all the demerits of the previous method using this one.

Remember, it checks 4 words prior and 2 words post all the positive and negative words be default, you can change this scope but this is good enough, so better leave it.

For better understanding, try using the above function on to: "good" "very good" "not good"

# Run following to check the words :

qdapDictionaries::negation.words