Relative Content

Tag Archive for rtidymodelsr-parsnip

What does sample_size hyperparameter actually represent in XGBoost?

I realise the values for sample_size in boost_tree represent the proportion (in XGB only, absolute number otherwise) of observations to be subsampled per iteration(tree) but what I don’t understand is how the proportions can be so high (e.g. .49) when sampling is supposed to be random without replacement. A proportion of .49 of the dataset per tree would surely mean repetition across 1000 trees? I don’t know if this is due to my lack of understanding of how subsamples are selected but any help would be greatly appreciated!