I am facing a memory leak when training models in physic simulators on different tasks. While the weird thing is that there are memory leaks on some certain task and some don’t. So I am trying to locate where those memory leaks happens.
By using the following debug script on tasks with leaks and tasks without leaks:
def find_names(obj):
frame = inspect.currentframe()
for frame in iter(lambda: frame.f_back, None):
frame.f_locals
obj_names = []
for referrer in gc.get_referrers(obj):
if isinstance(referrer, dict):
for k, v in referrer.items():
if v is obj:
obj_names.append(k)
return obj_names
for obj in gc.get_objects():
try:
if torch.is_tensor(obj) or (hasattr(obj, 'data') and torch.is_tensor(obj.data)):
if "nn" not in str(type(obj)):
print(type(obj), obj.size(), "name: ", find_names(obj))
except:
pass
I found that in memory leaked task, I got many of those:
<class 'torch.Tensor'> torch.Size([512]) name: ['obj']
Thus I want to know if there are any ways that I can locate where those tensors are created and never gets freed?
Carl CAI is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.