LOOP:
-A <- best attribute
-Assign A as decision attribute for Node
-For each value of A create a deescalate of node
-Sort training examples to create
-If examples perform classified stop else iterate over leaves
gain(s,a) = entropy(s) - Σv |Sv| / |S| entropy(Sv)
-Σv P(v) logP(v)
ID3: Bias
INDUCTIVE BIAS
Restriction bias: H
Preference bias:
-good spots at top
-correct over incorrect
-shorter trees
Decision trees: other considerations
- continuous attributes?
e.g. age, weight, distance
When do we stop?
-> everything classified correctly!
-> no more attribute!
-> no overfitting
Regression
splitting? variance?
output average, local linear fit