I’m trying to tokenize some images to train a HuggingFace model. In the process, I dowloaded and stored locally the images. The path to each of these images have been stored in a dictionnary of lists, in which the key “image_path” contains a list of path (strings) such as “.mypathto.jpeg”. The next step is to use these path one at a time to process these images.
When I’m trying to loop into the dictionnary and the lists, nothing seems to happen. The only error I get is from my try-exception which gives me error found : image_data
. Here’s the dict :
self.dataset = {
"image_path": [],
"label_path": [],
"image_data": [],
"label_data": []
}
Here’s my code :
def images_to_tensor(self):
"""
Convert images data to pytorch tensor by converting them to RGB values and then tensors using model processor.
Parameters :
None
Returns:
None
"""
try:
for key, value_list in self.dataset.items():
for value in value_list:
if key == "image_path":
with open(value, "rb") as file:
image_rgb = Image.open(file).convert("RGB")
image_tensor = self.processor(images=image_rgb, return_tensors="pt").pixel_values.squeeze(0)
self.dataset["image_data"].append(image_tensor)
print(f"dataset length image_data : {len(self.dataset["image_data"])}")
except Exception as e:
print(f"error found : {e}")
return None
I also tried with a list of dictionnaries but got the same problem.