There are multiple Html inside iXBRL folder.
I would like to extract specific table by matching keywords and stack extracted table together to an Excel.
Sometimes the table may not always have a match.
example file link:
https://mops.twse.com.tw/server-java/t164sb01?step=3&SYEAR=2023&file_name=tifrs-fr1-m1-ci-cr-6224-2023Q4.html
https://mops.twse.com.tw/server-java/t164sb01?step=3&SYEAR=2023&file_name=tifrs-fr1-m1-ci-cr-6111-2023Q4.html
python 3.12.3
beautifulsoup4 4.12.3
et-xmlfile 1.1.0
html5lib 1.1
lxml 5.2.2
numpy 1.26.4
openpyxl 3.1.2
pandas 2.2.2
pip 24.0
python-dateutil 2.9.0.post0
pytz 2024.1
six 1.16.0
soupsieve 2.5
tzdata 2024.1
webencodings 0.5.1
from io import StringIO
import os
import pandas as pd
from bs4 import BeautifulSoup
path = r"C:UsersWDAGUtilityAccountDesktopiXBRL"
data = pd.DataFrame()
for filename in os.listdir(path):
with open(os.path.join(path,filename), encoding='utf-8') as f:
soup = BeautifulSoup(f,'html.parser')
table = soup.select_one('table:-soup-contains("母子公司")')
df = pd.read_html(StringIO(str(table)))
df.insert(0, '檔名', filename)
data = pd.concat([data,df])
data.to_excel(r"C:母子公司間業務關係及重要交易往來情形.xlsx")
the code is not working properly.
fynn33 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
1