My code:
df = pd.read_xml(
path_or_buffer=PATH,
xpath="//Data",
compression="gzip"
)
I’m using pandas read_xml()
function to read xml.gz
format data. I’m using pandas 1.3.2
version. When I tried to read the data, pandas read data wrongly.
The data looks like as below. Both colA
and colB
should be a string.
1st data file:
<Data>
<colA>abc</colA>
<colB>168E3</colB>
</Data>
<Data>
<colA>def</colA>
</Data>
2nd data file:
<Data>
<colA>ghi</colA>
<colB>23456</colB>
</Data>
<Data>
<colA>jkl</colA>
</Data>
When I use read_xml()
function, it looks like below:
1st dataframe:
colA: abc, def
colB: 168000.0, None
2nd dataframe:
colA: ghi, jkl
colB: 23456.0, None
I want to read the data in string
format but there is no dtype
argument in pandas 1.3.2
. I want to know:
- How can I read the data with designated data type?
- When there is missing data in a column, pandas will assign the float type to that column. How to avoid it, or is there any setting to configure the data type of column with missing value when data is read?
Please note that I can only this pandas version and can’t update it. Thank you.