A simplified case as following:
Python 3.11.8 (main, Feb 26 2024, 21:39:34) [GCC 11.2.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.20.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import joblib
In [2]: joblib.__version__
Out[2]: '1.4.0'
In [3]: class Test:
...: def __init__(self):
...: self.values = {f'key{i}': [i] * 100 for i in range(10)}
...: def f(self):
...: import time
...: time.sleep(60)
...: return 1
...:
In [4]: obj = Test()
# "lsof -u eastsun | wc -l" got 273 before next line
In [5]: values = joblib.Parallel(n_jobs=32)(joblib.delayed(obj.f)() for i in range(100))
# "lsof -u eastsun | wc -l" got 3529 while previous line is running
My real usecase is that a task with a series arguments, which I assigned as its attributes.
I also have tried to package the attributes from a dict to a dataframe of pandas, something as:
In [2]: class Test:
...: def __init__(self):
...: import pandas as pd
...: self.values = pd.DataFrame([[i]*10 for i in range(1000)])
...: def f(self):
...: import time
...: time.sleep(60)
...: return 1
But it is even worse: the open files become 6800+!
How to avoid the so many open files? thanks.