Relative Content

Tag Archive for pythonlxml

how to retrieve all the text (including the tags/child_elements) from an element using lxml

xml_content = <root><para>Brother set had private his letters observe outward resolve. Shutters ye marriage to throwing we as. <child1>Effect in if agreed he wished wanted admire expect</child1>. Or shortly visitor is comfort <child2>placing to cheered do</child2>. Few hills tears are weeks saw. Partiality insensible celebrated is in. Am <child3>offended as wandered</child3>thoughts greatest an friendly. Evening covered in he exposed fertile to. Horses seeing at played plenty nature to expect we. Young say led stood hills own thing get</para></root>.

Find elements in xml file with lxml find() method

I have xml files that are 1 million+ lines long. I’m able to parse them without issue with BeautifulSoup, but it can take a minute or more to do the parsing with bs4. I’m trying to use lxml to do the parsing to hopefully speed things up dramatically, but I can’t get the find() method to work at all.