I am trying to inference an object detection model . I am having multiple camera with the same use case so I use multi threading to handle it.
I have an Object Model()
. This contains initialization of any model based on the parameter.
I do have the following function inside to handle the pre-processing of the raw images, mainly for model inference.
def preprocess(self, image):
try:
torch.cuda.synchronize()
self.im = letterbox(self.im0, (640, 640), stride=self.stride)[0] # padded resize
self.im = self.im.transpose((2, 0, 1))[::-1] # HWC to CHW, BGR to RGB
self.im = np.ascontiguousarray(self.im) # contiguous
self.im = torch.from_numpy(self.im).to(self.DEVICE).float() # uint8 to fp16/32
self.im /= 255 # 0 - 255 to 0.0 - 1.0
if len(self.im.shape) == 3:
self.im = self.im[None] # expand for batch dimension
return self.im
except Exception as e:
torch.cuda.synchronize()
self.logger.print_function_error(f"ModelInference - Error in preprocess(); {e}")
self.logger.print_function_error(f"ModelInference - traceback {traceback.format_exc()}")
raise # Re-raise exception to handle it later if needed
return None
and the predictor
def predict(self, image):
try:
im = self.preprocess(image)
self.pred = self.model(im)
self.pred = non_max_suppression(self.pred, self.conf_thres, self.iou_thres, self.classes, self.agnostic_nms, max_det=self.max_det)
except Exception as e:
self.logger.print_function_error(f"ModelInference - Error in predictObj(); {e}")
self.logger.print_function_error(f"ModelInference - traceback {traceback.format_exc()}")
return None # Return None or appropriate value in case of error
return self.detectionsBoxV3()
Now when I call this object in the main thread and execute it, I do not have any issues
When I create multiple objects of the Model() and run it in different threads , initially it works seamlessly . After sometime i get this error
2024-09-20 16:43:13,534 - ERROR - ModelInference - Error in predictObj(); cuDNN error: CUDNN_STATUS_MAPPING_ERROR
2024-09-20 16:43:13,535 - ERROR - ModelInference - traceback Traceback (most recent call last):
File "/home/test/Project/Module/Side/ModelInference.py", line 214, in predict
self.pred = self.model(im)
File "/home/test/env/python_files/auto/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/test/Project/Module/models/yolo.py", line 209, in forward
return self._forward_once(x, profile, visualize) # single-scale inference, train
File "/home/test/Project/Module/models/yolo.py", line 121, in _forward_once
x = m(x) # run
File "/home/test/env/python_files/auto/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/test/Project/Module/models/common.py", line 167, in forward
return self.cv3(torch.cat((self.m(self.cv1(x)), self.cv2(x)), 1))
File "/home/test/env/python_files/auto/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/test/env/python_files/auto/lib/python3.8/site-packages/torch/nn/modules/container.py", line 217, in forward
input = module(input)
File "/home/test/env/python_files/auto/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/test/Project/Module/models/common.py", line 120, in forward
return x + self.cv2(self.cv1(x)) if self.add else self.cv2(self.cv1(x))
File "/home/test/env/python_files/auto/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/test/Project/Module/models/common.py", line 59, in forward_fuse
return self.act(self.conv(x))
File "/home/test/env/python_files/auto/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/test/env/python_files/auto/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 463, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/home/test/env/python_files/auto/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: cuDNN error: CUDNN_STATUS_MAPPING_ERROR
even after thread handling to stop and restart on its own, I get a constant the below error until I restart the program
2024-09-21 10:16:21,356 - ERROR - ModelInference - Error in predictObj(); CUDA error: misaligned address
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.