I’m running parallel code to analyse very large volumes of data. I noticed that when it’s done I’m usually left with very large amounts of allocated memory that can’t be freed without restarting the kernel. Occasionally this led to OOM errors during subsequent analysis stages.
The simplest way to reproduce it is to run:
import numpy as np
import psutil
from joblib import Parallel, delayed
print(f'Initial memory usage: {psutil.Process().memory_info().rss / (2 ** 20)} MB')
def proc():
return np.ones(50000)
res = Parallel(n_jobs=6, verbose=1)(delayed(proc)() for d in range(20000))
print(f'After running parallel jobs: {psutil.Process().memory_info().rss / (2 ** 20)} MB')
for i in range(len(res)):
res[i] = None
del res
print(f'After deleting data: {psutil.Process().memory_info().rss / (2 ** 20)} MB')
A typical output is:
Initial memory usage: 57.28125 MB
[Parallel(n_jobs=6)]: Using backend LokyBackend with 6 concurrent workers.
[Parallel(n_jobs=6)]: Done 100 tasks | elapsed: 0.3s
[Parallel(n_jobs=6)]: Done 4468 tasks | elapsed: 3.7s
[Parallel(n_jobs=6)]: Done 12468 tasks | elapsed: 9.8s
[Parallel(n_jobs=6)]: Done 20000 out of 20000 | elapsed: 15.8s finished
After running parallel jobs: 7806.484375 MB
After deleting data: 6518.30859375 MB
The amount retained after deleting the data is variable between executions of the same code, but it never goes back to the initial usage. It scales with the size of the data returned by the task (in this case, 50000 x 8 bytes).
I’m using Ubuntu 22.04.4 LTS, This is my env:
_libgcc_mutex 0.1 main
_openmp_mutex 5.1 1_gnu
blas 1.0 mkl
ca-certificates 2024.3.11 h06a4308_0
intel-openmp 2023.1.0 hdb19cb5_46306
joblib 1.4.0 py39h06a4308_0
ld_impl_linux-64 2.38 h1181459_1
libffi 3.4.4 h6a678d5_0
libgcc-ng 11.2.0 h1234567_1
libgomp 11.2.0 h1234567_1
libstdcxx-ng 11.2.0 h1234567_1
mkl 2023.1.0 h213fc3f_46344
mkl-service 2.4.0 py39h5eee18b_1
mkl_fft 1.3.8 py39h5eee18b_0
mkl_random 1.2.4 py39hdb19cb5_0
ncurses 6.4 h6a678d5_0
numpy 1.26.4 py39h5f9d8c6_0
numpy-base 1.26.4 py39hb5e798b_0
openssl 3.0.13 h7f8727e_0
pip 23.3.1 py39h06a4308_0
psutil 5.9.0 py39h5eee18b_0
python 3.9.19 h955ad1f_0
readline 8.2 h5eee18b_0
setuptools 68.2.2 py39h06a4308_0
sqlite 3.41.2 h5eee18b_0
tbb 2021.8.0 hdb19cb5_0
tk 8.6.12 h1ccaba5_0
tzdata 2024a h04d1e81_0
wheel 0.41.2 py39h06a4308_0
xz 5.4.6 h5eee18b_0
zlib 1.2.13 h5eee18b_0
Tried the following with very similar results:
- Using multiprocessing.Pool.map instead of joblib
- Running it on Windows 11 instead of Ubuntu
- Using %xdel instead of del to delete the data
- Running gc.collect() in the end, and within the proc function
- Waiting between deleting the variable and checking memory usage
- Different joblib versions (I know there used to be a leak problem in earlier versions)