How to decide root node variable in decision tree

Untangle the puzzles of Decision Science

While I was learning "How to make a Decision Tree in SAS" I was curious about : "On what basis does SAS recommend variable next in the hierarchical structure?"

The question made me to delve deeper into it, to identify behind the screen chemistry … It took me several months to understand the complex mathematics (as had to take multiple naps in between).

Do you have the same question in your mind ? No problem, I will tell you the same with a "No sleeping" guarantee !

First let me understand few jargon :

1. Skewness

Let’s understand this term with a basic example.

Suppose India is going to observe General Election and only two parties are contesting (We won't name any parties, as we don't support any).



Various survey companies and new channels are gathering the voting data based on sample survey ( called opinion poll).

Companies took a sample of 1000 people and asked people about vote preferences and published their result based on this survey.



“When data is not equally distributed among all category then we say
distribution of data is skewed

Let's assume that the sample was taken from four cities Delhi, Mumbai, Chennai and Bengaluru.


Also,


I hope now you have crystal clear understanding about the skewness of the data distribution.


Let's move ahead then >>>