## Pages

### Learn Histogram and a new cool plot in R

Recently in the series of Sentiment Analysis, we calculated Sentiment Scores, now let's learn how visualizing these scores creatively.  It's now just a matter of time that your boss will fall in live with you!
Learn data art here at @ Ask Analytics!

We have illustrated Sentiment Score Calculation by two methods so far in the following articles :

Anyways, let's calculate the score pretty quick once again and then we would plot the score using Histogram and a totally fresh concept Dot Chart.

Using Second method here ( Sentiment Analysis with {qdap} package), without elaborating much.

Download the following file and keep in a folder, don't forget to change working directory in setwd command .

rm(list = ls())
setwd("G:\\AA\\Text Mining")
data_1 = read.csv("movies_reviews.csv")

NAMO = function(x)
{
y = tolower(x)
y = gsub("@\\w+", "", y)
y = gsub("[[:punct:]]", " ", y)
y = gsub("http", "", y)
y = gsub("www", "", y)
y = gsub("\\d+", "", y)
y = gsub("[^\x20-\x7E]", "", y)
y = gsub("^\\s+|\\s+\$", "", y)
return(y)
}
data_1\$clean = NAMO(data_1\$Review)

# Here comes second round of cleaning in which we are mainly removing redundant English words
if(!require(tm)) install.packages("tm")
require(tm)
textCorpus = Corpus(VectorSource(data_1\$clean ))
textCorpus = tm_map(textCorpus, removeWords, stopwords("english"))
textCorpus = tm_map(textCorpus, stripWhitespace)
x = as.data.frame(textCorpus)
clean_review = x\$text

# cleaned reviews are being merged back to data
data_2 = cbind(data_1[,c(1,2)], clean_review)
rm(data_1, x, clean_review, textCorpus)

if(!require(qdap)) install.packages("qdap")
require(qdap)
senti_score = polarity(data_2\$clean_review,
polarity.frame = qdapDictionaries::key.pol,constrain = FALSE,
negators = qdapDictionaries::negation.words,
amplifiers = qdapDictionaries::amplification.words,
deamplifiers = qdapDictionaries::deamplification.words)

score = senti_score\$all[,3]
data_2 =cbind(data_2,score)

All right, now we have Sentiment Scores against all the reviews, let's see how people are actually feeling about the movies :  The Terminal and The Lost World.

### First we would plot Histogram of the scores :

# First define, how many plots are required together on one screen, here we need 2 :
layout(matrix(c(1,2)))

# Subset the data
The_Terminal = data_2\$score[data_2\$Movie == "The Terminal"]

# Plot histogram : with 5 bins, purple color, x axis and top labels
h = hist(The_Terminal, breaks = 5, col = "purple", xlab = " Sentiment Score"
, main = "Sentiment Analysis of movie : The Terminal")

The_lost_world  = data_2\$score[data_2\$Movie == "The Lost World"]
h = hist(The_lost_world, breaks = 5, col = "grey", xlab = " Sentiment Score"
, main = "Sentiment Analysis of movie : The Lost World")

# and we are ...

You can then interpret these Histogram your own way e.g. movie The Terminal has received a more positive response in comparison to The Lost World.

Disclaimer :  This is just a representation of data based on a random selection of movies' reviews from random websites. We do not promote any particular movie, also we do not judge the success / failure of any movie. Our motive is purely academic.

### Let's now check it using Dot Chart :

Histogram, as such, is fine in such scenarios, but then Histogram is little Old School. Let's try one of the next generation charts now :

# Let us first sort the data by Score
x =  data_2[order(data_2\$score),]
# Movie here being the class variable, is defined as factor
x\$Movie = as.factor(x\$Movie)

#  Assign a color to each the movies
x\$color[x\$Movie == levels(x\$Movie)[1]] = "Black"
x\$color[x\$Movie == levels(x\$Movie)[2]] = "Red"

#  One plot per screen now
layout(matrix(c(1)))
#  Finally we draw the dot chart ( here cex defines the zoom level of chart, 0.7 is optimum)
dotchart(x\$score, labels = NULL, cex = 0.7, group = x\$Movie,
main = "Make your boss fall in love with you!",
xlab = "Sentiment Score", gcolor = "blue", color = x\$color)

# and we are done!

Soon we are going to write on next evolution in Sentiment Analysis, till then ...

Enjoy reading our other articles and stay tuned with us.

Kindly do provide your feedback in the 'Comments' Section and share as much as possible.

A humble appeal :  Please do like us @ Facebook