I am faced with reading a .csv file with some columns like this:
Data1 [-]; Data2 [%]
9,46;94,2%
9,45;94,1%
9,42;93,8%
I want to read Data1 [%]
column as a pandas DataFrame with values [94.2, 93.4, 96.4]
.
I am able to remove the percentage sign using this answer after reading the csv, however, this prevents me from using the read_csv(dec=",")
to turn the commas into decimal points.
Is there a way that I can convert these numbers into floats as I am reading the .csv file, or do I have to manipulate the DataFrame (removing percentage sign and converting to float values) after reading the .csv?
EDIT: here is the code I am currently using for reference:
.csv file:
Data1 [g/m3];Data2 [%]
9,46;94,2%
9,45;94,1%
9,42;93,8%
Python file:
import pandas as pd
df = pd.read_csv(f, encoding="latin-1", sep=";", decimal=",")
for col in df:
if str(df[col][0])[-1] == "%":
df[col] = df[col].str.rstrip('%').str.replace(',', '.').astype('float')
There is no builtin way to handle the %
from your input. The most straightforward would be to post-process the DataFrame after reading the data:
df = pd.read_csv('file.csv', sep=';')
df['Data1 [%]'] = pd.to_numeric(df['Data1 [%]'].str.replace(',', '.')
.str.rstrip('%'))
Now, you could be sneaky and take advantage of a custom separator to get rid of the %
:
df = pd.read_csv('file.csv', sep='%?;', decimal=',', engine='python')
NB. this won’t work for the last column though.
You could also use converters
during the import if you know the column names in advance:
def pct2num(s):
return float(s.rstrip('%').replace(',', '.'))
df = pd.read_csv('percent.csv', sep=';', decimal=',', engine='python',
converters={'Data1 [%]': pct2num}
)
Output:
Data1 [%] Data2 [-]
0 94.2 199
1 93.4 123
2 96.4 132
4