I have a function that reads a csv into a df. the csv is quite large, but it’s a combination of strings (categories) and numbers in different columns:
import pandas as pd
df_temp=pd.read_csv('somefile.csv',dtype="string")
df_temp.iloc[:,1:]=df_temp.iloc[:,1:].apply(pd.to_numeric,errors='coerce').copy()
I need to keep the first column as is. note this doesn’t work with “category” as well:
df_temp.iloc[:,1:]=df_temp.iloc[:,1:].astype("category").copy()
now when I run this in jupiter notebook it works fine.
when I run inside the function it doesn’t do anything. df_temp remains “object” or “string”.
however, as a workaround this works:
df_temp=pd.read_csv('somefile.csv',dtype="category")
df_temp.iloc[:,1:]=df_temp.iloc[:,1:].apply(pd.to_numeric,errors='coerce').copy()
I get some warnings, as some columns are mixed string and numbers – I want all of the strings to be NaN in this case, hence to_numeric with ‘coerce’. I’m fine with the warnings.
My issue is that the first code snippets doesn’t do anything. and I have no idea why.
3