My aim is to web-scrape the table on https://data.eastmoney.com/executive/list.html and save it to an excel. Please note that it has 2945 pages and I want to put all of them into one excel sheet.
The easiest way of doing it is to see where the data is pulled. So I F12 the webpage to get the source code and saw a data center <datacenter-web.eastmoney.com>
So I use that API within the red box above to get the data. Here’s my code
import pandas as pd
import requests
df = pd.DataFrame(
requests.get('https://datacenter-web.eastmoney.com/api/data/v1/get?
reportName=RPT_EXECUTIVE_HOLD_DETAILS&columns=ALL&sortColumns=CHANGE_DATE')
.json().get('result').get('data'))
df.to_excel('G:/ExecutiveHoldings/all_by_date.xlsx', index = True)
I want to get all the columns and the row to be sorted by CHANGE_DATE, and here’s the data I get:
Note that the rows only have a limited number of 500 lines, index from 0 ~ 499. Yet I want to get all dataframe from the web, with line numbers way beyond 500. Is there any easy method to get all 2945 pages of table and put them all into one excel sheet?