When I run my app queries in BigQuery console, they take 3-4 seconds to return nearly a million rows. The same queries in my streamlit python app (local, community cloud, GCP) all take about 180 seconds to complete including writing the results into a dataframe. Upon instrumentation it shows most of the time is with rowiterator. Is there a different way to do this? Environments have plenty of RAM and CPU.
├─ 163.313 RowIterator._items_iter google/api_core/page_iterator.py:206
│ [62 frames hidden] google, requests, urllib3, http, sock...
│ 88.930 _SSLSocket.read <built-in>
├─ 8.598 QueryJob.__iter__ google/cloud/bigquery/job/query.py:2107
│ [27 frames hidden] google, requests, urllib3, http, sock...
├─ 3.988 Row.__getitem__ google/cloud/bigquery/table.py:1501
│ [2 frames hidden] google
├─ 3.569 Client.query google/cloud/bigquery/client.py:3362
│ [25 frames hidden] google, requests, urllib3, http, sock...
└─ 2.815 [self] users.py
bq_client = st.session_state.bq_client
sql_query = f"""
SELECT *
FROM `dataexploration-193817.user_data.all_users_progress`
WHERE
first_open BETWEEN PARSE_DATE('%Y/%m/%d','{start_date}') AND CURRENT_DATE()
"""
rows_raw = bq_client.query(sql_query)
rows = [dict(row) for row in rows_raw]
df_user_list = pd.DataFrame(rows)