Can TFRecordDataSetm start reading elements from a random spot in the file?
I have an application with 200GB data and I can’t load it in RAM. Based on the data/problem, it will be extremely beneficial to have TFRecord files “sorted” in a certain way that makes them NOT random/shuffled. A key benefit of having sorted shards is that it will allow me to quickly change my training/test data to perform KFold analysis. However, the training on this data is extremely sensitive improperly shuffled training data, and on top of that, there are likely benefits to “ensuring” that each batch of training data includes some elements from each shard (similar to class balance). I dont think that my application needs to randomly select elements from each shard in order for the training data to be sufficiently random, however I don’t want the batches to look similar between epochs. I think the balance between these demands is being able to start reading each shard at a different point for each epoch. Is this possible?