Decision tree works just like computer language if.
In AI/ML world, the problem is usually like this:
Given training set with features [( f1,f2 ….), ….] and known category/label [c1, ….], how can we learn from this training set/data and design a decision tree , so that for any new data, we can predict which category/label it will be.
In plain English:
We can just try to split the dataset using any of feature, to see which one is the best at the first/top level, then recursively go down/split the subset.
But how should we choose which feature to split ( as the decision condition)?
The information theory ( entropy ) will help us: after the split/decision, the data should be in order more, thus entropy will decrease! This will guide us to choose which one as the decision point.
Quota from https://en.wikipedia.org/wiki/ID3_algorithm
- Calculate the entropy of every attribute using the data set
- Split the set into subsets using the attribute for which entropy is minimum (or, equivalently, information gain is maximum)
- Make a decision tree node containing that attribute
- Recurse on subsets using remaining attributes.
References:
https://en.wikipedia.org/wiki/Decision_tree_learning
https://en.wikipedia.org/wiki/ID3_algorithm