What is a good approach to split 3 years of hourly data in a train, validation and test set for an electricity price forecasting neural network?
I would like to train a simple neural network to forecast electricity prices in a certain region. However, I only have a ‘limited amount’ of data available (3 sequential years of the historical price, electricity generation and load in that region with 1 hour intervals – no other data import allowed). Literature recommends splitting up in a 60-20-20, 70-15-15 or 80-10-10 fashion to obtain the train, validation and test set. The split “2 years, 0.5 years, and 0.5 years” for train, validation, and test set respectively would result in hyperparameter optimization for the period January to June and testing the model for July to December. Intuitively this doesn’t feel like a good approach to me because this would cause a mismatch in seasonality in the validation and test set. Is this intuition correct and if so, what alternative approach could be taken?