Python: A-Z of data handling with PySpark

 


Data Wrangling with PySpark


I was listening to a seminar on PySpark, where the speaker told that documentation for PySpark is not as organized as it is for Pandas on the web. My analyst alter-ego shouted - "Challenge Accepted!"

In one of our earlier blogs, we covered basics of Python and Pandas in great details. We have tried to keep PySpark blog in the same line, so that one can enjoy learning "analogically", which I prefer in my case.

Python: A-Z of data handling with Pandas


Data Wrangling with Pandas


Data wrangling is the practice of converting data from a "raw" form into a user-ready form for descriptive analytics and provide feed for other horizons of analytics such as predictive analytics.

Pandas is a data-centric package(library) of Python eco-system for importing, manipulating, managing and analyzing data. This library was originally built on NumPy, the fundamental library for scientific computation in Python.

Ensemble Technique - Random Forest in R

Machine Learning Techniques - I


Machine Learning is a buzz word these days in the world of data science and analytics. R and Python have gone popular as these tools are full of advanced machine learning techniques.

@ Ask Analytics we have covered many basic machine learning techniques so far, now we are starting with advanced techniques!

The concepts of Bagging and Boosting

Ensemble Learning Techniques


In one of the previous posts we covered Random Forest, one of the most popular ensemble learning techniques. We covered concepts of bagging and boosting there, but not much in depth. Let's now understand the concepts of ensemble learning in greater details.

Decision Tree With Party Package

Decision Science


I still remember those olden golden days, when we did not have many options for anything, be it Biscuits or the Decision Tree. Generally people used to buy Parle-G biscuits in India and used to draw decision tree with SAS or SPSS.

These days market is full of assortment, which is both bad and good. Bad because people often get confused with too many choices, good because you don't have to compromise with lack of options. Data science has got evolved much with assortment availability. R itself provides you many packages to do the same work. Let's learn the beautiful party package for building decision tree and enjoy the power of assortment! Read it and you would understand the significance of its name.

Decision Tree in R with {tree} Package

Decision Science


We earlier covered the decision tree in R using {rpart} package in one of our previous articles. In R there are many packages that can be used for making a decision tree, out of which {tree} and {party} are my hot favorites. I will cover both of the packages one by one @ Ask Analytics.

Let's first learn usage of {tree} !

Descriptive Statistics With Proc Univariate


Feel your data !


Before going to a battle, a warrior better know what he is fighting against and so a data analyst ! It is advised to know and feel the  data before carrying out analysis on it. It is the best practice to examine the data initially by using the Proc Univariate in SAS.

Spellbinding Proc Spell

Hidden Gems of SAS - 2

Supercalifragilisticexpialidocious, the is the largest possible adjective I could find while I got to learn about this hidden gem of SAS : PROC SPELL. Learn it and believe me if you ever need to work in text mining, it would make your life so easy.

The pic is no exaggeration for this SAS procedure!


Time Series Forecasting - Part 5

ARIMA using SAS


We have covered basics about time series and also the basic methods of forecasting. It is time to learn the most important and most widely used for time series forecasting : ARIMA.

It is not possible to write ARIMA in a single stretch, it being full of complication, hence we plan to write it in series of article.

Useful VTABLE in SAS

Hidden Gems of SAS - 1

SAS is such a powerful software and sometimes it surprises me a lot. It has occurred quite a times with me that I have written a long code or a macro for some task, but later got to know that using one of the hidden gems, the task can be completed quite easily. At Ask Analytics, we will try to unearth many of such gems sooner or later !!!


Text Mining in R - Part 7

Sentiment Analysis in R - Coolest Method So Far


So far we have discussed all the basics, a rudimentary method, an evolved method and a cool way to visualize Sentiment Analysis. Let's now explore one of the most evolved methods that I have found out while learning Text Analytics. It took me lot of time to learn about it, but it won't take much of your time ... coz Ask Analytics has made it easy!

Awesome way to visualize sentiment score

Learn Histogram and a new cool plot in R


Recently in the series of Sentiment Analysis, we calculated Sentiment Scores, now let's learn how visualizing these scores creatively.  It's now just a matter of time that your boss will fall in live with you!
Learn data art here at @ Ask Analytics!

Time Series Forecasting - Part 4


Triple Exponential Smoothing - Excel : SAS : R


In the previous article of this series, we explained Double Exponential Smoothing method, also called Holt's method for time series forecasting. Now we are taking up TES also called Holt Winter's Method.
Once we are done with this article we are all set to learn ARIMA.

Text Mining in R - Part 6

Sentiment Analysis in R - Second Method


In the previous article, we have elaborated basics about Sentiment Analysis and most rudimentary method for performing the same.  We also discussed about the drawbacks of the basic method. In this article, we would learn one of the evolved methods of Sentiment Analysis with which we would try to overcome few of the drawbacks of the previous method.