I am using VS Code running a python script that is processing some data and saving it to a .csv-file. OS: Ubuntu 22.04.4 LTS, running in VMware Workstation 16 Player (Host: Win 10). Unfortunately, on some saves it deadlocks my virtual machine. When I go to the details window in Windows Task Manager, it says the process is in a deadlock and shows me two threads of the VM waiting for each other.
csvDF = csvDF[csvDF.index < i + 1]
cutAt = path.find("temperatures_matched")
path = path[:cutAt] + path2TestCSV[i]
#csvDF.to_csv(path, index = False)
np.savetxt(path, csvDF, fmt = '%s', delimiter= ',', header="minTemp,maxTemp,mean,matchedThermalArray")
#time.sleep(0.5)
#csvDF = pd.DataFrame(array, columns = csvColumns)
I already tested a delay after saving and switching from pd’s to_csv() to np’s savetext(), unfortunately without success.
5
It could be many reasons why your code fails. Could resources, or CPU throttling. What I would advise is using smaller chunks in case you are manipulating large files. You could try applying smaller chunks, either to_csv
or savetxt
. To add non-blocking behaviour and avoid I/O stressed, you could also use Threadpool executor.
import pandas as pd
import numpy as np
import os
import tempfile
import shutil
import logging
from concurrent.futures import ThreadPoolExecutor
# Setup basic logging
logging.basicConfig(level=logging.DEBUG, format='%(asctime)s - %(levelname)s - %(message)s')
def save_csv(df, path):
try:
# Use a temporary file to save the CSV
with tempfile.NamedTemporaryFile(delete=False, mode='w+', suffix='.csv') as temp_file:
df.to_csv(temp_file.name, index=False) # Write data to temporary file
temp_path = temp_file.name # Store path of the tempfile
# Move the temporary file to the desired path
shutil.move(temp_path, path)
logging.info(f"Successfully written to {path}")
except Exception as e:
logging.error(f"Failed to write CSV at {path}: {e}")
# Recovery or retry logic can go here if necessary
# Example DataFrame creation
def create_example_data():
data = {
'minTemp': [1, 2, 3],
'maxTemp': [4, 5, 6],
'mean': [2.5, 3.5, 4.5],
'matchedThermalArray': ['a', 'b', 'c']
}
return pd.DataFrame(data)
def main():
csvDF = create_example_data()
path2TestCSV = ["file1_temperatures_matched.csv", "file2_temperatures_matched.csv", "file3_temperatures_matched.csv"]
with ThreadPoolExecutor(max_workers=2) as executor:
for i, path in enumerate(path2TestCSV):
# Processing path to cut off an example part
cutAt = path.find("temperatures_matched")
final_path = path[:cutAt] + "processed_" + path2TestCSV[i]
# Ensure directory exists before attempting to write
os.makedirs(os.path.dirname(final_path), exist_ok=True)
# Use ThreadPoolExecutor to write CSV in a separate thread
executor.submit(save_csv, csvDF[csvDF.index < i + 1], final_path)
4