Consider the following short dataframe example:
df = pd.DataFrame({'column1': [2, 4, 8, 0],
'column2': [2, 0, 0, 0],
'column3': ["test", 2, 1, 8]})
df.dtypes shows that the datatypes of the columns are:
column1 int64
column2 int64
column3 object
Now I would like to run sweetviz over this dataset to get a reporting on the columns and their data:
report = sv.analyze(df)
report.show_notebook()
Problem is, Sweetviz seems to realise that my column3 is mostly numbers even though it is of the type object. Now it is not generating the report but giving the following suggestion:
Convert series [column3] to a numerical value (if makes sense):
One way to do this is:
df['column3'] = pd.to_numeric(df['column3'], errors='coerce')
Unfortunately this isn’t an option, because I want the report also to highlight misused columns in my Data, so I want to treat the column as object even though only a small fraction of the values are not numbers.
I have played around with the parameters that sweetviz allows:
feature_config = sv.FeatureConfig(force_text=['column3'])
report = sv.analyze(df)
report.show_notebook()
For example I would expect sweetviz with this config to treat column3 as text and ignore the type detection implemented in sweetviz.
Unfortunately I get the same suggestion to convert the column to numeric and convert the string values to NaN.
I also tried the other possible parameters for column3 skip, force_cat, force_num.
force_cat, force_num don’t help at all leading to the same result.
Skip leaves column3 out in the report which is also not a solution.
Any way to force sweetviz to leave the object-typed column3 as it is and analyze it?