I want to remove the duplicates from this csv file, the enumeration of the repeated columns is not present in the original file, it is just the output of my tests with pandas df.loc and df.drop_duplicated:
DISCRIMINATION,VALUE(M),VALUE(M).1,VALUE(M).2,VALUE(M).3,VALUE(M).4,VALUE(M).5,VALUE(M).6,NOTES,NOTES.1,municipalities
Cash and Banks,Cash and Banks,Cash and Banks,“R$ 3.533.376,49”,“R$ 3.533.376,49”,“R$ 3.533.376,49”,“R$ 3.533.376,49”,“R$ 3.533.376,49”,“R$ 3.533.376,49”,1.0,AMELIA RODRIGUES
(+) Financial Assets,(+) Financial Assets,(+) Financial Assets,(+) Financial Assets,(+) Financial Assets,(+) Financial Assets,“R$ 0,00”,“R$ 0,00”,2.0,AMELIA RODRIGUES
Translated with www.DeepL.com/Translator (free version)
i tried
# read csv
df = pd.read_csv(filepath, sep=",")
print(f"Lendo o arquivo: {filepath}")
df = df.loc[:, ~df.columns.duplicated(keep='first')]
and this:
df.drop_duplicates(subset=None, inplace=True)
Write the results to a different file
df.to_csv(file_name_output, index=False)
BUT ALL OF THESE DOESN’T WORK!
both return these: (i already tried turn of/on the header and index)
,DISCRIMINAÇÃO,VALOR(M),VALOR(M).1,VALOR(M).2,VALOR(M).3,VALOR(M).4,VALOR(M).5,VALOR(M).6,NOTAS,NOTAS.1,municipios
0,Caixa e Bancos,Caixa e Bancos,Caixa e Bancos,"R$ 3.533.376,49","R$ 3.533.376,49","R$ 3.533.376,49","R$ 3.533.376,49","R$ 3.533.376,49","R$ 3.533.376,49",1.0,AMELIA RODRIGUES
1,(+) Haveres Financeiros,(+) Haveres Financeiros,(+) Haveres Financeiros,(+) Haveres Financeiros,(+) Haveres Financeiros,(+) Haveres Financeiros,(+) Haveres Financeiros,"R$ 0,00","R$ 0,00",2.0,AMELIA RODRIGUES
2,(=) Disponibilidade Financeira,(=) Disponibilidade Financeira,(=) Disponibilidade Financeira,"R$ 3.533.376,49","R$ 3.533.376,49","R$ 3.533.376,49","R$ 3.533.376,49","R$ 3.533.376,49","R$ 3.533.376,49",3.0,AMELIA RODRIGUES
3,(-) Consignações e Retenções,(-) Consignações e Retenções,(-) Consignações e Retenções,"R$ 4.324.290,19","R$ 4.324.290,19","R$ 4.324.290,19","R$ 4.324.290,19","R$ 4.324.290,19","R$
10