How to decide root node variable in decision tree - Continued

Click here to go back 

Let's now understand the term:

2. Entropy 

You need not delve much into the mathematics behind, I just wanted to let you know the formula of Entropy. All these calculations automatically take place behind the screen. For example purpose only, let me calculate the Entropy for Chennai :

So, the entropy for Chennai is 0.4690.

Entropy can be interpreted as degree of randomness or uncertainty or disturbance. 

Let's now logically deduce the concept :

When the data will be more skewed  
=> It will be biased/skewed towards one of the categories
=> which means lesser will be the entropy/uncertainty/disturbance
=> and hence better would be the prediction outcome.
In the  nut shell we can say, more the skewness, better is predictability.

Now, calculate the entropy for all the cities.

We have learnt the calculation of category wise entropy. Now, next question would be

"How we would calculate the entropy at the variable level?"

Solution is : Weighted average of all class/category

No comments:

Post a Comment

Do provide us your feedback, it would help us serve your better.