Classification trees is a supervised machine learning algorithm
used for both classification and regression problems
Below are the key terms associated with classification trees
Root Node - It constitutes the entire population or sample, and
this further gets divided into two or more homogeneous sets
Splitting - It is a procedure of dividing a node into two or more
sub-nodes
Decision Node - When a sub-node further breaks into sub-nodes,
then it is called decision node
Leaf/Terminal Node -Nodes that are not possible to split further
are called as leaf or terminal node
Pruning - When we remove sub-nodes of a decision nodes, this
process is called pruning, it can also be said that it is opposite of splitting
Branch/Sub-Tree - A sub section of the entire is called branch or
sub tree
Parent and Child Node - A node, which is divided into sub nodes is
called parent node or sub- nodes whereas sub nodes are the child of parent node
Advantages
- Easy
to understand
- Useful
in data exploration
- Less
data cleaning required
- Data
type is not constraint
Challenges
- Over-fitting
- Not
fit for continuous variables
Techniques
for Division/Splitting
- Gini
Index
- Chi
square
- Information
Gain
- Reduction of variance
No comments:
Post a Comment