I tried to apply my newly acquired pandas skills to a typical ANOVA calculation on large dataset with 5, partially nested indices. The corresponding dataframe “dSchema” looks like this:
X q c s r i
1 3.0 0 0 0 0 0
2 3.0 0 0 0 0 1
3 3.0 0 0 0 0 2
4 3.0 0 0 0 0 3
5 3.0 0 0 0 0 4
... ... .. .. .. .. ..
8496 1.0 8 4 9 1 0
8497 0.0 8 4 9 1 1
8498 0.0 8 4 9 1 2
8499 0.0 8 4 9 1 3
8500 0.0 8 4 9 1 4
[8500 rows x 6 columns]
For the actual calculation I came up with the following column:
filter
0 ['0', '0', '0']
1 ['0', '0', '1']
2 ['0', '0', '2']
3 ['0', '0', '3']
4 ['0', '0', '4']
... ...
8495 ['8', '9', '0']
8496 ['8', '9', '1']
8497 ['8', '9', '2']
8498 ['8', '9', '3']
8499 ['8', '9', '4']
[8500 rows x 1 columns]
The one column ‘filter’ contains strings of lists
But when I tried to add the column ‘filter’ to the original dataframe, something strange happened:
X q c s r i filter
1 3.0 0 0 0 0 0 ['0', '0', '1']
2 3.0 0 0 0 0 1 ['0', '0', '2']
3 3.0 0 0 0 0 2 ['0', '0', '3']
4 3.0 0 0 0 0 3 ['0', '0', '4']
5 3.0 0 0 0 0 4 ['0', '0', '0']
... ... .. .. .. .. .. ...
8496 1.0 8 4 9 1 0 ['8', '9', '1']
8497 0.0 8 4 9 1 1 ['8', '9', '2']
8498 0.0 8 4 9 1 2 ['8', '9', '3']
8499 0.0 8 4 9 1 3 ['8', '9', '4']
8500 0.0 8 4 9 1 4 <NA>
[8500 rows x 7 columns]
the column ‘filter’ appears shifted up relative to the original dataframe by one position. I was unable to correct this with the .shift() function, when attaching ‘filter’ to ‘schema’. Can anybody explain this to me, or even better, help me to fix it?