I have some data called variable
that I can simulate in the following way:
import pandas as pd
import numpy as np
from scipy.stats import bootstrap
random_values = np.random.uniform(low=0, high=10, size=10000)
df = pd.DataFrame({"variable": random_values})
I want to bin my data in 5 bins within the bins bins = [0, 2, 4, 6, 8, 10]
and calculate to each of the bins error-bars with some bootstrapping method, e.g. the confidence interval of the 95% percent level. I figured out that the cumbersome thing is to calculate the error bars. I could do it with scipy.stats.bootstrap
and then do
bootstrap(one_of_the_bins, my_statistic, confidence_level=0.95, method='percentile')
but it requires that I split my data into chunks according to the bins and loop over the chunks. So I wonder is there a more convenient way to do this, is there some functionality integrated in pandas for that? Or can I provide to scipy.stats
my full data and the bins and then scipy will do the calculations for all the bins together? Thank you for any advice!