Tuesday, 30 June 2020

Classification trees


Classification trees is a supervised machine learning algorithm used for both classification and regression problems

Below are the key terms associated with classification trees

Root Node - It constitutes the entire population or sample, and this further gets divided into two or more homogeneous sets 

Splitting - It is a procedure of dividing a node into two or more sub-nodes

Decision Node - When a sub-node further breaks into sub-nodes, then it is called decision node

Leaf/Terminal Node -Nodes that are not possible to split further are called as leaf or terminal node

Pruning - When we remove sub-nodes of a decision nodes, this process is called pruning, it can also be said that it is opposite of splitting

Branch/Sub-Tree - A sub section of the entire is called branch or sub tree

Parent and Child Node - A node, which is divided into sub nodes is called parent node or sub- nodes whereas sub nodes are the child of parent node





Advantages
  • Easy to understand
  • Useful in data exploration
  • Less data cleaning required
  • Data type is not constraint
Challenges
  • Over-fitting
  • Not fit for continuous variables
Techniques for Division/Splitting
  • Gini Index
  • Chi square
  • Information Gain
  • Reduction of variance

No comments:

Post a Comment