I am creating a UI application with Qt in Python. It performs operations on pandas DataFrames in a separate threading.Thread
to keep the UI responsive; no individual pandas instruction takes noticable time. However, there still is a lag of around around a second in the MainThread.
I heard of the global interpreter lock (GIL), but as far as I understand, it should be released after a few milliseconds. When the separate Thread has already been executing for a whole second, why does it still execute more Python instructions?
I found out that if I sprinkle time.sleep(0.00001)
throughout the separate thread, the lag is gone. But this can’t be the solution.
If I use some other heavy computation without pandas (only tried other operations implemented completely in Python), there is no delay in the MainThread.
I reproduced the problem in the following script. The highest deltaTime for me is around 1 second, whereas with the commented sleep
it is around 0.1s.
import threading
import time
from datetime import datetime
import pandas as pd
longest_so_far = 0
last_timestamp = datetime.now()
first_run = True
def print_delta_time(interval=0.001):
global longest_so_far
global last_timestamp
global first_run
while True:
time.sleep(interval)
current_time = datetime.now()
if first_run:
first_run = False
else:
delta_time = (current_time - last_timestamp).total_seconds()
if delta_time > longest_so_far:
print("New longest deltaTime:", delta_time)
longest_so_far = delta_time
last_timestamp = current_time
def perform_pandas_operations():
time.sleep(1)
print("Starting pandas operations")
df = pd.DataFrame({
'A': [f"{i}" for i in range(1000)],
})
for _ in range(20000):
df['A'].str.contains(r'123$').sum()
# time.sleep(0.0000001)
print("Finished pandas operations")
pandas_thread = threading.Thread(target=perform_pandas_operations)
pandas_thread.daemon = True # Ensure the thread exits when the main program exits
pandas_thread.start()
print_delta_time()