I have a number of dataframes from which I have to take a sample. The samples taken from that dataframe, have to be excluded from the next dataframe in order to not have any ‘double’ samples as there is some overlap.
my code is as follows
df_list = [df1, df2, df3, df4, df5]
samplesizes = [8, 2, 4, 4, 2]
sample = []
for df, samplesize in zip(df_list, samplesizes):
if sample: #can't drop in the first loop
df = df.drop(sample) #I want to drop the taken samples from the current df
if max_pop_size < len(df):
samplesize = max_pop_size #can't take a sample larger than population
sample.append(df.sample(samplesize, random_state=1000))
I get stuck on the dropping after the first loop. I’ve tried several things and none seem to work.
Any help would be much appreciated!
1