Text Mining in R - Part 1


Twitter data scraping


Analysis of twitter data is quite a hotcake these days, plenty of material is available to learn "How to analyze Twitter" data, but most of them don't tell how to extract it from Twitter. @Ask Analytics, we are going to discuss "How to extract data from Twitter" and then we would learn how to analyze it in subsequent articles.

We promise to keep and make it as simple as possible, as we always do.
Harvesting data from web is also known as SCRAPING, and to scrape Twitter data you first need to have a basic twitter account.

Why is it good to learn web scraping ?

Such public data is quite good resource to know "What people are thinking about anything", be it a FMCG product or a politicians. You can use this FREE of cost data to do some analysis, and if you are able to extract good insights, you can publish it; there is no problem of someone blaming you of data theft.

Let's know learn it

Now keep your R session open and open web browser along with.

Go the link https://dev.twitter.com/apps and you would land on the Application Management page, where you need to sign in with your Twitter account credentials.

Click to enlarge
Twitter is going to ask you to fill certain details, which you just fill in. It might ask you some other details, which vary case to case.

You can give some name to app, which is already not been used, I used "xxxxxxxxyyyyyyyyy" and then give some description. You then need to give a website name, if you have any, you can give is, else you can give any XYZ website name ( I used https://mail.google.com/) here for demo.

Click to enlarge

Then you agree to TERMS and CONDITIONS, without wasting your time reading those and finally submit to land on the next page.

There are mainly two ways of linking R and Twitter, here we are following the easy path.

Later on public demand, we can explain the complex path too.


Now select Key and Access Token tab.



It is time to take Token Action, as shown in picture above.



So finally you have these 4 strings, keep it safe with you as these can be used time and again, you need not follow this process again.

Consumer Key (API Key)         =      "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
Consumer Secret (API Secret) =      "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
Access Token                                 =       "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
Access Token Secret                      =       "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

Do remember that there is hyphen in Access token, it should not be removed.


Now come to R Studio session.

Let's store these all 4 keys into 4 vectors.


Consumer_key = "6NY7fDvV________QDT6WtrDK2p"
Consumer_secret = "6R06rlKb5LEy3y__________BzXvgXXHV8V6oZC"
access_token = "3154348417-u0a_____________QJIjQHeMErdJVI"
access_token_secret = "0fZ5WxRDNfH47________AtIkhAC0NQQaSVWx"

You need to use your own keys, I have masked keys here,


 Now we would install package twitterR, which is the package used to connect Twitter and R

install.packages('twitteR')
library(twitteR)

# Now we use above listed vector for Using direct authentication
setup_twitter_oauth(Consumer_key,Consumer_secret,access_token,access_token_secret)

# Now there comes a prompt in console which asks you for option, you need to write 1 in console itself and hit enter key.  If there is no such prompt, there is no need to worry.


click to enlarge



Voila! it is done.

Now it is time to extract tweets from Twitter, are you excited ??? I was very much excited while I did it first time.

Let's see, what people are talking using #Kejiwal starting Apr 1, 2016.

Kejriwal_tweets = searchTwitter("#Kejriwal", n=150,lang="en", since = "2016-04-01", )

# to learn more arguments of searchTwitter, use

?searchTwitter

# You can read on these tweets ( right now in List form)
head(Kejriwal_tweets)

# The data scraped from Twitter contains a lot of things along with text, now to get only text part

tweets <- sapply(Kejriwal_tweets, function(x) x$getText()) 


Now we have the data in R, we can do text mining on it and get some useful insights off it.

We will cover the text mining part in our subsequent article pretty soon, till then ...


Enjoy reading our other articles and stay tuned with us.

Kindly do provide your feedback in the 'Comments' Section and share as much as possible.