Ask Analytics: Text Mining in R

Sentiment Analysis in R - Coolest Method So Far

So far we have discussed all the basics, a rudimentary method, an evolved method and a cool way to visualize Sentiment Analysis. Let's now explore one of the most evolved methods that I have found out while learning Text Analytics. It took me lot of time to learn about it, but it won't take much of your time ... coz Ask Analytics has made it easy!

Ask Analytics recommends step wise learning, hence please go through following articles as well before you start reading this article.

Related articles:

Text Mining in R - Part 1

Text Mining in R - Part 2

Text Mining in R - Part 3

Text Mining in R - Part 4

Text Mining in R - Part 5

Text Mining in R - Part 6

For earlier methods, we used a .csv file, let us now try scoring twitter data in this exercise.

We have already covered, how to scrape and prepare data from Twitter in one of the previous articles @ Ask Analytics.

# Let's first scrape the twitter data for particular #tag

Consumer_key<- "hw1U_______________XoF7dHti"
Consumer_secret <- "tlnGbRKVIkjW___________________oJFmWmgcmPwokruaQ"
access_token <- "3154348417-VsEmRQgp_______________6G3dPFJ3Q3uO"
access_token_secret <- "G258EyQ6______________________95XH2RzoxsC"
setup_twitter_oauth(Consumer_key,Consumer_secret,access_token,access_token_secret)

# Sorry, but I cannot share my credential, you need to get you know. To know how to get these, please follow our first blog on text mining in R :

if (!require(twitteR)) install.packages("twitteR")
library(twitteR)
setup_twitter_oauth(Consumer_key,Consumer_secret,access_token,access_token_secret)

# We have started R's engine, it is now time for action. We have chosen the today's trending #tag for analysis purpose.
roots = searchTwitter("#TanmayRoasted", n=150,lang="en", since = "2016-05-30" )

# let take a look on top 6 tweets
head(roots)

# we just want the text part from the tweets for analysis
tweets <- sapply(roots, function(x) x$getText())

# Let's define NAMO function for text cleaning
NAMO = function(x)
{

y = gsub("[^\x20-\x7E]", "", x)
y = tolower(y)
y = gsub("@\\w+", "", y)
y = gsub("[[:punct:]]", " ", y)
y = gsub("http", "", y)
y = gsub("www", "", y)
y = gsub("\\d+", "", y)
y = gsub("^\\s+|\\s+$", "", y)
return(y)
}
# Let's clean the text now using NAMO function
clean_tweets = NAMO(tweets)

# check the cleaned tweets text now
head(clean_tweets)

# Now we need to install set of packages that are required for the third type of Sentiment Analysis. Here I am installing few supporting packages first while main package being {sentiment}. Please follow the warnings in your console, you might require to install more packages if it asks for.

if(!require(devtools)) install.packages("devtools")
if(!require(Rstem)) install.packages("Rstem")
if(!require(slam)) install.packages("slam")
require(devtools)
require(Rstem)
require(slam)
# Now installing main package
#install_url("http://cran.r-project.org/src/contrib/Archive/sentiment/sentiment_0.2.tar.gz")
require(sentiment)

# Now following two commands will do all the magic !
emotion = classify_emotion(clean_tweets, algorithm="bayes", prior=1.0)
polarity = classify_polarity(clean_tweets, algorithm="bayes")

#Voila! It's done.

Let;s look at the results and then we can tweak these as per our requirement.

So as a result of all the above code, you get two data sets : emotion and polarity

emotion

As you see here, Bayes techniques check for various emotions : Anger, Disgust, Fear, Joy, Sadness and Surprise in the string and then gives the best fit on the basis of highest score in the row.
It sometimes is quite indecisive e.g. in Line 1. But then you can further use IF ELSE logic to make it "Disgust" as the Disgust score is highest in the row.

Hope the concept is quite clear now. The second result is polarity, which you know already from our previous blogs.

polarity

Here also, you can take your own Positive/Negative ratio cut off to better decide the sentiments.

And that is IT.

Now here is some GYAN, I would like to share.

1. Now while you know all the techniques, you can learn more yourself easily, but remember, principle remains same, mostly.

2. There might be few Sarcastic texts and these would add to your error. It is not much possible to deal with it.

3. Don't be very judgmental while doing twitter analytics, as especially in India, now many people use Twitter. Also the population using Twitter is not the true representative sample of India always.

4. People use slangs. abbreviations, acronyms often in the Tweets that carry their respective emotions, but it is hard to detect here.

All right then ...

Humble appeal:

Download our Android app

Please do like us @ Facebook

Enjoy reading our other articles and stay tuned with us.

Kindly do provide your feedback in the 'Comments' Section and share as much as possible.

1 comment:

AnonymousFebruary 16, 2026 at 7:48 AM
EFCE0D9D0B
Many websites offer a variety of resources to enhance your knowledge and skills. For those interested in exploring new opportunities, visiting https://dtfhub.com can be very helpful. This platform provides valuable insights and tools to support your growth. Whether you're a beginner or an expert, the site has something to offer for everyone. It’s a great place to start if you want to stay updated and improve yourself.

Do provide us your feedback, it would help us serve your better.

Pages

Text Mining in R - Part 7

Sentiment Analysis in R - Coolest Method So Far

1 comment: