I have a python list of torch_geometric.data.Data objects (each one representing a graph). There is no easy way for me to access original raw files for this data: I just have the list. I need to turn this list of Data objects into a torch_geometric.data.InMemoryDataset or torch_geometric.data.Dataset object in order to integrate it with a larger code base which I did not write. How do I do this?
To be clear, I know that one can use a list of Data objects to make a torch_geometric.data.DataLoader object. But, I specifically need a Dataset object, not a DataLoader object, as the larger code base does some additional processing steps on Dataset objects before turning them into loaders.
I don’t understand why PyG makes this so difficult. Should there not be a very easy way to do this?
I tried using a simple CustomDataset class
class CustomDataset(InMemoryDataset):
def __init__(self, data):
super().__init__()
self.data = data
def __len__(self):
return len(self.data)
def __getitem__(self, idx):
sample = self.data[idx]
return sample
and it gave me a KeyIndex error when trying to get the Data object at index 0. I also tried a version of the above code where the super class is Dataset as opposed to InMemoryDataset but I couldn’t figure out how to make the collate method work.
td12345 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.