I’m currently working on a project where I need to detect and mask overhead cables in images using Facebook’s DETR (DEtection TRansformers) model. I’ve prepared a dataset using CVAT (Computer Vision Annotation Tool) where I annotated images of overhead cables and exported the dataset in COCO 1.0 format. Despite following tutorials and examples, I’m encountering issues with the model not detecting any annotations and providing the expected output.
Dataset Details:
Dataset contains images of overhead cables annotated using CVAT.
Exported dataset is in COCO 1.0 format, including annotations for 100 images.
Sample annotation data includes information about image dimensions, file names, and segmentation details for overhead cables.
{"licenses":[{"name":"","id":0,"url":""}],"info":{"contributor":"","date_created":"","description":"","url":"","version":"","year":""},"categories":[{"id":1,"name":"over_head_cable","supercategory":""}],"images":[{"id":1,"width":3472,"height":4624,"file_name":"IMG_20240504_094743692_BURST000_COVER.jpg","license":0,"flickr_url":"","coco_url":"","date_captured":0}......"annotations":[{"id":1,"image_id":1,"category_id":1,"segmentation":{"counts":[2506,59,185,341,4034,69,179,343,4030,75,174,347,4025,81,170,349,4021,87,166,351,4018,91,162,355,4014,95,159,357,4011,99,156,359,4008,103,153,361,4005,127,13,77,3,17,20,363,700,17,3,27,13,17,3,17,3,47,23,27,23,17,3,17,3,17,3025,135,1,109,11,369,691,59,1,99,11,39,11,69,3018,251,1,379,681,169,1,49,1,79,3011,636,675,305,3007,640,669,311,3003,644,663,317,2998,648,659,321,2995,651,655,325,2992,654,651,329}]}}] #this is sample data, actual data has contents for 100+ images
**
Problem Statement:**
After loading the COCO JSON file and running the DETR model on images, I’m not getting the desired output.
The output image remains the same as the input, without any masks or highlighted areas for detected overhead cables.
Despite setting up the model and preprocessing the images correctly, it seems that DETR is not detecting any annotations.
Overview of my work:
So, I imported necessary libraries such as Torch, torchvision, PIL, json, os, and matplotlib.pyplot.
Loaded the COCO JSON file containing dataset annotations.
Created class names and CLASSES list based on the categories in the JSON file.
Defined functions for converting bounding box coordinates, rescaling boxes, and setting up image transformations.
Loaded the pre-trained DETR model from Facebook’s model repository.
Preprocessed images, ran them through the model, and extracted bounding boxes and labels from the output.
Attempted to draw bounding boxes around detected objects but did not receive the expected results.
import torch
import torchvision
from torchvision import transforms as T
from torchvision.utils import draw_bounding_boxes
from PIL import Image
import json
import os
import matplotlib.pyplot as plt
# Path to your COCO JSON file
json_file = '/content/instances_default.json'
# Load the JSON data
with open(json_file, 'r') as f:
coco_data = json.load(f)
# Extract the class names
class_names = [cat['name'] for cat in coco_data['categories']]
# Create the CLASSES list with 'N/A' and your class names
CLASSES = ['N/A'] + class_names
# These variables are for loading for pytorch hub
MODEL_REPO = "facebookresearch/detr:main" # The model repo
MODEL_NAME = "detr_resnet50" # The model
!wget -O wire.jpg https://i.ibb.co/qxTBn8k/IMG-20240504-101029464-BURST004.jpg
im = Image.open("wire.jpg")
im
def box_cxcywh_to_xyxy(x):
"""
This is to convert the bounding boxes from DETr format to xy coordiante pairs
"""
x_c, y_c, w, h = x.unbind(1)
b = [(x_c - 0.5 * w), (y_c - 0.5 * h),
(x_c + 0.5 * w), (y_c + 0.5 * h)]
return torch.stack(b, dim=1)
def rescale_boxes(out_bbox, size):
"""
Rescale the bounding boxes to the original image height and width so they can be overlaid.
"""
img_w, img_h = size
b = box_cxcywh_to_xyxy(out_bbox)
b = b * torch.tensor([img_w, img_h, img_w, img_h], dtype=torch.float32)
return b
transforms = T.Compose([
T.Resize(800), # Resizes too 800px while maintaing the aspect ratio
T.ToTensor(),
T.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) # Imagenet normalization
])
img = transforms(im).unsqueeze(0)
model = torch.hub.load(MODEL_REPO, MODEL_NAME, pretrained=True)
model = model.eval()
out = model(img)
probs = out["pred_logits"].softmax(-1)[0, :, :-1] # processing output to return bounding boxes and labels
keep = probs.max(-1).values > 0.7 # only predictions with 0.7 probability or higher
bboxes_scaled = rescale_boxes(out['pred_boxes'][0, keep], im.size)
labels = [CLASSES[i] for i in probs[keep].argmax(1)]
tensor_img = torch.tensor((T.ToTensor()(im) * 255), dtype=torch.uint8) # draw_bounding_boxes requires images in this format.
bb = draw_bounding_boxes(tensor_img, boxes=bboxes_scaled, width=6) # this function will automatically overlay bounding boxes over the image
# imported at beginning of file
# draw figure
fig = plt.figure(figsize=(14, 8))
plt.imshow(bb.permute(1, 2, 0)) # convert it into a image that matplotlib can display
Expected Outcome:
I expect the DETR model to accurately detect overhead cables in the images and provide masked output images highlighting the detected cables.
Instead of bounding boxes, I require the output images to contain masks for the detected cables.
I’m seeking guidance on how to modify the code to ensure that DETR detects annotations and provides masked output images.
Any insights into why DETR might not be detecting annotations or suggestions for alternative approaches to achieve the desired outcome would be greatly appreciated.
Environment:
Google Colab
Python 3.x
PyTorch
torchvision
Again, Any help or suggestions to resolve this issue would be highly appreciated. Thank you in advance for your assistance!
Prem Rishi R is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.