### Python Tutorial 6.0

But before entering into the dangerous combat arena with the furious statistical techniques, better learn all the basic moves !!!

Please download the .csv file that you can use to practice the codes.

import pandas as p

Aggregation=p.read_csv('C:\\data.csv')

Aggregation

**sample output :**

**Only showing top 20 observations**

**Let's play our favorite game**

**"Question and answer"**to learn the concepts.

**Q1. How many rows are there in the dataset?***Ans: Aggregation['Product'].count() # we can take any column instead of product, the result would remain same*

Output:

**Q2. How much is the total sales?***Ans: Aggregation['Sales'].sum()*

Output:

**Q3. What is the total sale of 'APAC' Region?***Ans. Aggregation['Sales'][(Aggregation.Region=='APAC')].sum()*

Output:

*Q4. What is the total sale of**Product 'A' in**'APAC' Region?*Ans. Aggregation['Sales'][(Aggregation.Product=='A')&(Aggregation.Region=='APAC') ].sum()

Output:

Apart from count and sum, there are many functions which you can use to get basic statistic about the data. e. g.

**mean, median, min, max, mode**and**std**etc.

**Q5. How many unique products are there?***Ans: Aggregation['Product'].nunique()*

**Let's learn how to aggregate the data group wise**

**Q6. How to get a list of all distinct products?***Ans. Aggregation.groupby(['Product']).groups.keys()*

**Output:**

**Q7. Get sales by products**Ans.

*Aggregation.groupby('Product')['Sales'].sum()*

**Output:**

**Q8.**

*Get sales by Products of 'APAC' Region only*Ans. Aggregation[Aggregation['Region']=='APAC'].groupby('Product')['Sales'].sum()

**Output:**

**Let me just add one more dimension to it**

Let's now aggregate the data using multiple variables with multiple measures.

**Q9. Create a table of total sales by Product and Region**

Ans. Aggregation.groupby(['Product','Region'])['Sales'].sum()

**Output:**

**Q10. Create a table of total sales and frequency by Product and Region**

Ans. Aggregation.groupby(['Product','Region']).agg({'Sales':{'total_sales':'sum','Frequency':'count'}})

**Output:**

Whatever be the operation and calculation we have done in above examples either would create Panda Series or Panda Dataframe.

If we go theoretically then one column's output is called series and multiple column's output is called dataframe.

We can change it by a small change in syntax. Lets see how..

**Q9. from above examples**

**a=Aggregation.groupby(['Product','Region'])['Sales'].sum() # produce data series**

type(a)

**Output:**

**Add only additional bracket to aggregated variable**

**[[**

**'Sales'**

**]]**.sum() # produce data frame

type(a)

**Output:**

One more concept I would like to tell you is

**indexing**

**If you see the output of Q9 then you must have noticed that in Product column there are blank entries populated if a product is same. It is called indexing; an output is indexed by Product and Region.**

Since we may want to utilize this dataframe for further data processing, so it is good if we fill these blank entries.

**To avoid this index(blank entries), pass “as_index=False” to the groupby operation.**

**a=Aggregation.groupby(['Region','Product'],as_index=False)[['Sales']].sum()**

a

Output:

Output:

**and that's it for now! We have learned aggregation ...**

Enjoy reading our other articles and stay tuned with us.

Kindly do provide your feedback in the 'Comments' Section and share as much as possible.

## No comments:

## Post a Comment

Do provide us your feedback, it would help us serve your better.