I am very new DS and ML and noob on computer vision and image classification problem space.
I found below code snippet online/blogs and I got basic image classification training working on google colab.
Can someone please explain in simple terms what exactly is happening under the hood when we do RandomResizedCrop, RandomHorizontalFlip, Normalize, To Tensors.
From the publication -> https://openreview.net/pdf?id=YicbFdNTTy
Image is broken down into patches, and normalized and fed into the neural network.
from torchvision.transforms import (CenterCrop,
Compose,
Normalize,
RandomHorizontalFlip,
RandomResizedCrop,
Resize,
ToTensor)
from transformers import ViTImageProcessor
processor = ViTImageProcessor.from_pretrained("google/vit-base-patch16-224-in21k")
image_mean = processor.image_mean
image_std = processor.image_std
size = processor.size["height"]
normalize = Normalize(mean=image_mean, std=image_std)
_train_transforms = Compose(
[
RandomResizedCrop(size),
RandomHorizontalFlip(),
ToTensor(),
normalize,
]
)
_val_transforms = Compose(
[
Resize(size),
CenterCrop(size),
ToTensor(),
normalize,
]
)
def train_transforms(examples):
examples['pixel_values'] = [_train_transforms(image.convert("RGB")) for image in examples['image']]
return examples
def val_transforms(examples):
examples['pixel_values'] = [_val_transforms(image.convert("RGB")) for image in examples['image']]
return examples
food['train'].set_transform(train_transforms)
food['test'].set_transform(val_transforms)
I appreciate your help!!!