I would like to train a simple neural network to forecast electricity prices in a certain region. However, I only have a ‘limited amount’ of data available (3 sequential years of the historical price, electricity generation and load in that region with 1 hour intervals – no other data import allowed). Literature recommends splitting up in a 60-20-20, 70-15-15 or 80-10-10 fashion to obtain the train, validation and test set. The split “2 years, 0.5 years, and 0.5 years” for train, validation, and test set respectively would result in hyperparameter optimization for the period January to June and testing the model for July to December. Intuitively this doesn’t feel like a good approach to me because this would cause a mismatch in seasonality in the validation and test set. Is this intuition correct and if so, what alternative approach could be taken?
I’ve searched quite a while and found a method called ‘rolling cross-validation’, but I’m not sure it’s the right approach as I’m quite new to the topic of neural networks.
Robbe is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.