I am reviewing a codebase that uses a Random Forest (RF) Regressor, and I’ve noticed that bootstrapping is applied before creating each RF model. However, RF inherently uses bootstrapping to train each Decision Tree (DT).
Does it make sense and is it usefull to use additional bootstrapping before creating a Random Forest?
Here’s an illustration of the process:
- From 75 training observations, 100 bootstrap batches are created.
- Each bootstrap batch is used to train an RF, which itself consists
of 100 DTs. - At the end, the predictions from each DT are aggregated to form the
RF prediction. - Finally, the predictions from all RFs are aggregated to obtain the final result.