In R, I have a dataset dir
with the variables Continent
, Level
, Format
and pop_stratum
. As you can see, the intersections are not equally populated:
> table(dir$Continent, dir$Level)
National Regional Semi-autonomous
Africa 31 0 10
Asia 22 5 0
Europe 860 5973 23
North America 1 5287 34
Oceania 269 55 210
South America 156 19 72
> table(dir$Continent, dir$Format)
Citizen initiative Referendum
Africa 0 41
Asia 14 13
Europe 1269 5587
North America 1897 3425
Oceania 40 494
South America 9 238
> table(dir$pop_stratum)
S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11
6607 1372 641 720 538 403 400 235 163 150 140
S12 S13 S14 S15 S16 S17 S18 S19 S20 S21 S22
105 42 18 30 39 28 28 18 78 76 87
S23 S24 S25 S26 S27 S28 S29 S30 S31 S32 S33
130 119 65 10 69 103 99 65 81 20 31
S34 S35 S36 S37 S38 S39 S40 S41 S42 S43 S44
50 15 43 28 19 34 29 0 15 0 1
S45 S46 S47 S48 S49 S50 S51 S52 S53 S54 S55
0 0 0 1 0 7 0 0 1 0 0
S56 S57 S58 S59 S60
1 56 7 2 8
I want to draw a sample from dir
, while stratifying: for Continent
128 cases per category where possible; for Level
25% of Semi-autonomous, 30% of National and 45% of Regional cases; for Format
1/3 of Citizen initiatives and 2/3 of Referendums; and for pop_stratum
an as equal number across categories as possible.
Can someone help me with how?
4