Decision tree works just like computer language if.

In AI/ML world, the problem is usually like this:

Given training set with features [( f1,f2 ….), ….] and known category/label [c1, ….], how can we learn from this training set/data and design a decision tree , so that for any new data, we can predict which category/label  it will be.

In plain English:

We can just try to split the dataset using any of feature, to see which one is the best at the first/top level, then recursively go down/split the subset.

But how should we choose which feature to split ( as the decision condition)?

The information theory ( entropy ) will help us:  after the split/decision, the data should be in order more, thus entropy will decrease! This will guide us to choose which one as the decision point.

Quota from

  1. Calculate the entropy of every attribute using the data set S
  2. Split the set S into subsets using the attribute for which entropy is minimum (or, equivalently, information gain is maximum)
  3. Make a decision tree node containing that attribute
  4. Recurse on subsets using remaining attributes.



