Think of a time-series and the first term that comes to our statistical mind is ARIMA. Right ?

###

###

While performing Linear regression on the sales data, we can keep the original outliers intact, initially. Later, if any of the x variable is not explaining the outliers, we can treat the observation the way we did above. We need to take there two variables : Seasonality Index and Trend variable also as x variable.

###

Kindly do provide your feedback in the 'Comments' Section and share as much as possible.

We would cover ARIMA soon on our blog, but its applicability is limited to forecasting. Also, it is a uni-variate practice that doesn't consider external factors.

More often we need to study the effect of external factors on the a time-series such as sales, revenue etc. In such cases, we can use regression analysis while at the same time considering the key elements of time-series :

**Seasonality and Trend**. Let's learn do we calculate these variables.
I believe you have gone through our blog covering Linear Regression or you know about it.

###
**First let's understand the terms Seasonality and trend :**

**Seasonality:**

Think of Ice-creams .... Please come back to blog ... so, which period of year, do you think, sales of ice-creams is highest in year. Of course, the summer season and it is lowest in the winter months. This is a classical example of seasonality. Definition wise seasonality is any predictable change or pattern in a time series that recurs or repeats over a one-year period (generally).

**Trend:**

In a time-series, trend is a continuous increase or decrease in data.

**We can understand these terms better with the help of following pics:**

It is not at all necessary that a time-series consists of both seasonality and trend together. Sometimes, one of the components or sometimes even both are missing.

Let's now understand how to capture these components (Seasonality and Trend) of time-series during a regression analysis.

**Also, we would understand the importance and practical application of outlier treatment along with.**

###
**Please download the excel file : Sales of XYZ Soft Drinks**

If you simply plot the sales of XYZ soft drink as such ( consisting of outliers), your graph looks like:

Looking at the graph, or just applying color scale conditional formatting, you can easily identify the outliers at Jul 2011 and Jul 2012. There are multiple ways to treat such outlier, we here are using treatment with average of lag and lead method (will discuss the other methods to treat outliers in a separate article).

Post outlier treatment, sales is not looking like >

It is having bumps at equal interval and also is increasing swiftly. definitely it is a case of time series with seasonality and trend.

**Now let's us first learn how to calculate "Seasonality".**

Please click to enlarge |

Make a pivot on the data provided, as illustrated in the file "

**XYZ Sales.xlsx**", Tab "**Seasonality**".
We put Months in the row labels, and year in the column labels. For the sake of understanding, we try calculation both with "

**Sales with Outlier**"and "**Sales (Post treatment)**".
First average for each month across years in calculated. Now month wise averages are divided with their averages so as to get an index called

**Seasonality Index.**
Due to presence of outliers in first pivot, the seasonality Index gets skewed and is not the correct representative. You can check the same plotting seasonality index with sames (treated one). I hope, now you know, why is it so important to treat the outliers. We use vlookup function to merge seasonality with sales data.

**Let's calculate "Trend" now :**

Please click to enlarge |

We have made a replica of sales data(only with post outlier treatment) in Tab "

**Trend**".

Let's first plot it. Do a right click the the chart series and add a trend line. There would be multiple options suggested by Excel, please choose "

**Linear**" and in the bottom of option window, check the "Display Equation on Chart" and "Display R-Squared value on chart" .... and click "**Close**". Here is you trend line along with it equation and the fitment.
You can use the equation

**y = 12.427x + 866.71**,**to get the trend variable. The above method has been described in order to make you understand the trend variable visually.**
The direct formula based method to calculate the trend is :

Populate a counter 1,2,3,4 ... and then use the function "

**Slope**" and "**Intercept**" to calculate trend variable. You can check it in the "Trend" tab of XYZ Sales file.

While performing Linear regression on the sales data, we can keep the original outliers intact, initially. Later, if any of the x variable is not explaining the outliers, we can treat the observation the way we did above. We need to take there two variables : Seasonality Index and Trend variable also as x variable.

**Believe me, these variable would help explaining Y variable to a larger extent in such cases, especially seasonality index. Based on my experience, trend variable is generally muli-collinear with other variables and might get removed during modeling exercise.**###
**How can we calculate the seasonality index and trend in SAS **

**Enjoy reading our other articles and stay tuned with ...**

Kindly do provide your feedback in the 'Comments' Section and share as much as possible.

Awesome post. Thank you so much.

ReplyDeleteecommerce website development company in chennai