Python Tutorial 1.0
It took me long to write the next article, I apologize for it. I was much occupied in routine task, but now I would try to be regular with my blogging practice.
In the previous article on Python, we covered few basics of Python. In this article, we would learn "How to explore and subset the data in Python.
# Let's first create a data with 5 columns and 8 observations
import pandas as p
Namelist = ["A","B","C","D","E","F","G","H"]
Agelist = [22,23,24,20,19,24,22,23]
Genderlist= ["M","F","M","F","M","F","M","F"]
Earninglist= [800,700,500,1000,1100,800,700,600]
Expenselist = [100,110,120,110,130,90,100,80]
Data_1 = {'Name': Namelist,
'Age': Agelist,
'Gender' : Genderlist,
'Earning': Earninglist,
'Expense': Expenselist}
Data_2 = p.DataFrame(Data_1,columns=['Name','Age','Gender','Earning','Expense'])

Now I want to check how many rows and columns are there in the data.
Code: Data_2.shape
Here is the answer in console :
(8, 5)
It means 8 rows and 5 columns.

I want to check sample of the data.
Code :
Data_2.head() # will show top 5 observations of data
or
Data_2.tail() # will show bottom 5 observations of data
and you can see the sample.
you can specify the number of observations.
for example : Data_2.head(10) will show top 10 observations of data.
I just want the columns' names of the data.

Code : Data_2.columns
Here is the answer in console :
Out[1]: Index(['Name', 'Age', 'Gender', 'Earning', 'Expense'], dtype='object')

Remember Indexing Rule : In python indexing start from 0.So, first row or column is referred by 0.
We can select range of data using both either labels or integer base indexing:
loc : integer and label based selection
iloc : integer based selection
Code : Data_2.iloc[1:4,::]
Will show 2nd,3rd and 4th observation and all columns
Code : Data_2.iloc[::,[0,2]]
Will show all rows of 1st and 3rd column
Code : Data_2.iloc[[1,2,3], 0:3]
Will show 2nd, 3rd and 4th observation of 1st to 3rd columns.
Want to select all rows and two columns 'Name' and 'Gender' :
# We need to use loc argument to specify label base indexing
Keeping variables :
Suppose you want to keep 1st to 3rd variables
Data_3 = Data_2.iloc[: ,[0,2]]
We can specify the name of the columns
Data_2.loc[:,['Name','Gender']]
# We need to use loc argument to specify label base indexing
Simplest way is :Data_3 = Data_2.iloc[: ,[0,2]]
We can specify the name of the columns
Data_2.loc[:,['Name','Gender']]
# We need to use loc argument to specify label base indexing
Data_2[['Age','Gender']]

Dropping variables :
Suppose you want to drop 1st to 3rd variables
Data_4 = Data_2.drop(['Name','Gender'],axis=1)
Data_4 = Data_2.drop(['Name','Gender'],axis=1)
Adding variable :
Suppose you want to add new variable or columnData_2['Saving'] = [700, 590, 380, 890, 970, 710, 600, 520]
Creating new variable using existing once :
Data_2['New']=Data_2['Earning']-Data_2['Expense']

& is used for AND logical operations.
| is used for OR logical operations.
Let's try to answer few Questions :
1. Select members whose earning are 500.
Data_2[(Data_2.Earning==500)]
2. Select members whose earning are greater than 1000 and age >= 20.
Data_2[(Data_2.Earning>=1000)&(Data_2.Age>=20)]
3. Select data for members A, B and C.
Data_2[Data_2['Name'].isin(['A','B','C'])]
4. Select data for all the members except A, B and C.
Data_2[~Data_2['Name'].isin(['A','B','C'])]
5. Select data where gender not equal to 'M'.
Data_2[~Data_2['Gender'].isin(['M'])]
Finishing article here itself, in the next artcile we would be covering the "How to modify data with IF-Else conditions in Python", till then ...
Enjoy reading our other articles and stay tuned with us.
Kindly do provide your feedback in the 'Comments' Section and share as much as possible.
No comments:
Post a Comment
Do provide us your feedback, it would help us serve your better.