I am trying to implement adaboost. The dataset I have chosen is Covertype from the mlpack package. The problem is in calculating the intial weights as 1/n_elem of the dataset, where n_elem (406709) is a very large number and so the division comes close to 0. Also, the classes are very unbalanced:
label 0 appears 148378 times.
label 1 appears 198219 times.
label 2 appears 25086 times.
label 3 appears 1935 times.
label 4 appears 6656 times.
labels 5 appears 12181 times.
labels 6 appears 14254 times. So when I normalize the weights occurs segmentation fault
The only idea I came up with is to make a partition of the dataset but I don’t know whether to keep this distribution of classes or balance it? Could this be an idea ? Are there any other possibilities ?