Random forest is a supervised learning
algorithm and works for both classification and regression problems.
Random forest is an ensemble classifier
which is made by using multiple decision trees. Ensemble models combine the
results from different models.
Application
- Credit card fault
- Consumer finance survey
- Identification of disease in patients using classification
- Identify customer churn
How Random Forest works?
- Randomly select n features from N, where n << N and N are number of features
- For node d, calculate the best split point
among the n feature
- Split the node among two daughter nodes using
the best split
- Repeat first 3 steps until n number of nodes
has been reached
- Build your forest by repeating steps 1 to 4
for D number of times where D is number of trees to be constructed
Advantages
- Reduces overfitting compare to decision trees, that helps to improve the accuracy
- Works on both classification and regression problems
- Works for both continuous and categorical data
- Automatically treats missing value in the data
- No need to normalize the data
Disadvantages
- Require high computation power as multiple trees are build during process
- Training time is high compare to decision trees
No comments:
Post a Comment