Ask Analytics: Awesome way to visualize sentiment score

Learn Histogram and a new cool plot in R

Recently in the series of Sentiment Analysis, we calculated Sentiment Scores, now let's learn how visualizing these scores creatively. It's now just a matter of time that your boss will fall in live with you!
Learn data art here at @ Ask Analytics!

We have illustrated Sentiment Score Calculation by two methods so far in the following articles :

Text Mining in R - Part 5

Text Mining in R - Part 6

Anyways, let's calculate the score pretty quick once again and then we would plot the score using Histogram and a totally fresh concept Dot Chart.

Using Second method here ( Sentiment Analysis with {qdap} package), without elaborating much.

Download the following file and keep in a folder, don't forget to change working directory in setwd command .

:: Please download the file ::

rm(list = ls())
setwd("G:\\AA\\Text Mining")
data_1 = read.csv("movies_reviews.csv")

NAMO = function(x)
{
y = tolower(x)
y = gsub("@\\w+", "", y)
y = gsub("[[:punct:]]", " ", y)
y = gsub("http", "", y)
y = gsub("www", "", y)
y = gsub("\\d+", "", y)
y = gsub("[^\x20-\x7E]", "", y)
y = gsub("^\\s+|\\s+$", "", y)
return(y)
}
data_1$clean = NAMO(data_1$Review)

# Here comes second round of cleaning in which we are mainly removing redundant English words
if(!require(tm)) install.packages("tm")
require(tm)
textCorpus = Corpus(VectorSource(data_1$clean ))
textCorpus = tm_map(textCorpus, removeWords, stopwords("english"))
textCorpus = tm_map(textCorpus, stripWhitespace)
x = as.data.frame(textCorpus)
clean_review = x$text

# cleaned reviews are being merged back to data
data_2 = cbind(data_1[,c(1,2)], clean_review)
rm(data_1, x, clean_review, textCorpus)

if(!require(qdap)) install.packages("qdap")
require(qdap)
senti_score = polarity(data_2$clean_review,
polarity.frame = qdapDictionaries::key.pol,constrain = FALSE,
negators = qdapDictionaries::negation.words,
amplifiers = qdapDictionaries::amplification.words,
deamplifiers = qdapDictionaries::deamplification.words)

score = senti_score$all[,3]
data_2 =cbind(data_2,score)

All right, now we have Sentiment Scores against all the reviews, let's see how people are actually feeling about the movies : The Terminal and The Lost World.

First we would plot Histogram of the scores :

# First define, how many plots are required together on one screen, here we need 2 :
layout(matrix(c(1,2)))

# Subset the data
The_Terminal = data_2$score[data_2$Movie == "The Terminal"]

# Plot histogram : with 5 bins, purple color, x axis and top labels
h = hist(The_Terminal, breaks = 5, col = "purple", xlab = " Sentiment Score"
, main = "Sentiment Analysis of movie : The Terminal")

The_lost_world = data_2$score[data_2$Movie == "The Lost World"]
h = hist(The_lost_world, breaks = 5, col = "grey", xlab = " Sentiment Score"
, main = "Sentiment Analysis of movie : The Lost World")

# and we are ...

You can then interpret these Histogram your own way e.g. movie The Terminal has received a more positive response in comparison to The Lost World.

Disclaimer : This is just a representation of data based on a random selection of movies' reviews from random websites. We do not promote any particular movie, also we do not judge the success / failure of any movie. Our motive is purely academic.

Let's now check it using Dot Chart :

Histogram, as such, is fine in such scenarios, but then Histogram is little Old School. Let's try one of the next generation charts now :

# Let us first sort the data by Score
x = data_2[order(data_2$score),]
# Movie here being the class variable, is defined as factor
x$Movie = as.factor(x$Movie)

# Assign a color to each the movies
x$color[x$Movie == levels(x$Movie)[1]] = "Black"
x$color[x$Movie == levels(x$Movie)[2]] = "Red"

# One plot per screen now
layout(matrix(c(1)))
# Finally we draw the dot chart ( here cex defines the zoom level of chart, 0.7 is optimum)
dotchart(x$score, labels = NULL, cex = 0.7, group = x$Movie,
main = "Make your boss fall in love with you!",
xlab = "Sentiment Score", gcolor = "blue", color = x$color)