I want to filter out rows containing empty lists in columns no_te_ins
and te_ins
.
Here is part of the input df te_filter
:
chr family start end P1-6 P1-12 P1-22 P1-25 P1-26 P1-28 P1-88 P1-89 P1-90 P1-92 P1-93 P1-95 no_te_ins te_ins
16 NC_064017.1 uc|28 133586.0 133683.0 0.411 0.386 0.322 0.416 0.319 0.364 0.184 0.407 0.328 0.254 0.663 0.437 [P1-88, P1-92] []
493 NC_064017.1 Stowaway|1 2182670.0 2182719.0 0.000 0.000 0.000 0.000 0.682 0.000 0.000 0.000 0.000 0.000 0.000 0.791 [] []
494 NC_064017.1 uc|31 2187699.0 2188215.0 0.978 0.986 1.000 1.000 0.936 0.968 1.000 0.962 0.853 0.922 1.000 0.891 [] [P1-12, P1-22, P1-25]
495 NC_064017.1 uc|9 2194130.0 2194325.0 0.981 1.000 0.000 0.224 0.868 0.895 0.850 0.784 0.932 0.000 0.265 0.893 [P1-25, P1-93] [P1-12, P1-26, P1-28]
Here is the part of the code:
te_filter['te_ins'] = te_filter['te_ins'].astype(str)
te_filter['no_te_ins'] = te_filter['no_te_ins'].astype(str)
te_filter_final = te_filter[te_filter['no_te_ins'] != '[]']
print(te_filter_final.to_string)
te_filter_final_2 = te_filter[te_filter['te_ins'] != '[]']
print(te_filter_final_2.to_string)
It works good for no_te_ins
column an the output for te_filter_final
is:
chr family start end P1-6 P1-12 P1-22 P1-25 P1-26 P1-28 P1-88 P1-89 P1-90 P1-92 P1-93 P1-95 no_te_ins te_ins
16 NC_064017.1 uc|28 133586.0 133683.0 0.411 0.386 0.322 0.416 0.319 0.364 0.184 0.407 0.328 0.254 0.663 0.437 ['P1-88', 'P1-92'] ['']
495 NC_064017.1 uc|9 2194130.0 2194325.0 0.981 1.000 0.000 0.224 0.868 0.895 0.850 0.784 0.932 0.000 0.265 0.893 ['P1-25', 'P1-93'] ['P1-12', 'P1-26', 'P1-28']
but for te_ins
it doesn’t remove the desired row (there is no effect) and the output for te_filter_final_2
is the same as the input (only the values in te_ins
column is changed from [] to [”] (?):
chr family start end P1-6 P1-12 P1-22 P1-25 P1-26 P1-28 P1-88 P1-89 P1-90 P1-92 P1-93 P1-95 no_te_ins te_ins
16 NC_064017.1 uc|28 133586.0 133683.0 0.411 0.386 0.322 0.416 0.319 0.364 0.184 0.407 0.328 0.254 0.663 0.437 ['P1-88', 'P1-92'] ['']
493 NC_064017.1 Stowaway|1 2182670.0 2182719.0 0.000 0.000 0.000 0.000 0.682 0.000 0.000 0.000 0.000 0.000 0.000 0.791 [] ['']
494 NC_064017.1 uc|31 2187699.0 2188215.0 0.978 0.986 1.000 1.000 0.936 0.968 1.000 0.962 0.853 0.922 1.000 0.891 [] ['P1-12', 'P1-22', 'P1-25']
495 NC_064017.1 uc|9 2194130.0 2194325.0 0.981 1.000 0.000 0.224 0.868 0.895 0.850 0.784 0.932 0.000 0.265 0.893 ['P1-25', 'P1-93'] ['P1-12', 'P1-26', 'P1-28']
Any idea what’s going on? Is there something wrong with the data type and how to fix it?