I have an xml document from the IRS and I’m trying to grab the values of specific tags that interest me. For example, in the following xml data, I’m interested in the value of CYTotalExpensesAmt, which is 12345:
<returndata>
<irs990>
<CYTotalExpensesAmt>12345</CYTotalExpensesAmt>
</irs990>
<returndata>
When I code the following, it returns a memory location:
x = root.iter('CYTotalExpensesAmt')
print(x)
But when I try to grab the value, the 12345, with the following code:
print(x.text)
or
for e in root.iter('CYTotalExpensesAmt'):
print(e.text)
I get an error or nothing returned at all. Any ideas on what I can do differently to access the value of tags I know the name of but do not know their indexed location?
3
Your first expression is an iterator and the text content can not entered directly. Change it to a list, e.g. :
(Note: The xml must valid, as commented by Gordon)
import xml.etree.ElementTree as ET
# close last tag as commented by Gordon
xml_ = """<returndata>
<irs990>
<CYTotalExpensesAmt>12345</CYTotalExpensesAmt>
</irs990>
</returndata>"""
root = ET.fromstring(xml_)
# x is a iterator, you can change it to a list, take the first element and show the text)
x = root.iter('CYTotalExpensesAmt')
print(list(x)[0].text)
for elem in root.iter('CYTotalExpensesAmt'):
print(elem.text)
Output:
12345
12345
1