I have a training data set, which includes the original data + some data augmented from the original. I also have a testing data set which are all original data. Now, I want to implement k-fold cross validation, hence I combined both data set since k-fold will be the one to split the dataset into training and testing. My question is, is that okay? Especially with the case when the original data was used as training and then the augmented one are used as testing (since the splitting is random)? How do I augment the dataset while using k-fold cross validation? Thank you so much!
Here is what I am planning to do based on what I have understood from this discussion:
1.) Remove all augmented data from data set so all of the data are original.
2.) Perform k-fold cross validation.
3.) For each fold, perform data augmentation on the training data set. For example if k=4, perform data augmentation on the 3 training data groups.
4.) Train the model.
5.) Test the model with the testing data.
6.) Remove the augmented data, and then go back to step 2 with the next fold.
However, I am not sure if this is correct.
gino is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.