Ask Analytics: How to decide root node variable in decision tree

<<<Click here to go back

3. Information Gain

Information gain is the key value for deciding the root note in decision tree or say quantifying the importance of variables.

In order to calculate the information gain of any of the variable, first, we need to calculate the entropy of dependent variable (done in the previous step).

Using Entropy, we calculate the information gain :

Information gain = Entropy of sample(dependent variable) - Average Entropy of any of the independent variable

Calculations are given below :

Variable is city in above stated example.

Information gain can be interpreted as ability of reducing the uncertainty (Entropy) and hence increase predictability. We can say that City variable is able to reduce uncertainty in the prediction outcome by a small amount of 0.06.

"Larger the information gain better would be prediction outcome"

Thus the Statistical tool, whichever you use, calculated the information value of all the independent variables and decides hierarchy on its basis.

Please take the snapshot of the snap below with your "mind-cam" ...

More data will be skewed => Less will be entropy => More will be information gain => Better would be prediction outcome

Hope you are clear about the concept. We would give more examples in future.

Enjoy reading our other articles and stay tuned with ...

Kindly do provide your feedback in the 'Comments' Section and share as much as possible.

Pages

How to decide root node variable in decision tree - Climax

Information gain = Entropy of sample(dependent variable) - Average Entropy of any of the independent variable

No comments:

Post a Comment