Suppose I have a list of list of numbers that happen to be encoded as strings.
import pandas as pd
pylist = [['1', '43'], ['2', '42'], ['3', '41'], ['4', '40'], ['5', '39']]
Now I want a dataframe where these numbers are integers.
I can see from pandas documentation that I can force a data type via dtype
, but when I run the following:
pyframe_1 = pd.DataFrame(pylist,dtype=int)
I get the following warning:
FutureWarning: Could not cast to int32, falling back to object. This behavior is deprecated. In a future version, when a dtype is passed to 'DataFrame', either all columns will be cast to that dtype, or a TypeError will be raised.
and by inspection via dtypes
:
pytypes_1 = pyframe_1.dtypes.to_list() # dtype[object_] of numpy module
my columns are np.object
types.
But I can cast my columns to integer via two ways:
First one is column by column:
pyframe_2 = pd.DataFrame(pylist)
pyframe_2[0] = pyframe_2[0].astype(int)
pyframe_2[1] = pyframe_1[1].astype(int)
Second one is on the entire dataframe in an one-liner:
pyframe_3 = pd.DataFrame(pylist).astype(int)
Both give me a dataframe of integer columns from a list of list of strings.
My question is why does the first case, where I explicitly use dtype
when creating a dataframe raise a warning (or error) with no conversion for the types? Why even have it as an option in the first place?