I want to use Nixtlas’s StatsForecast to make predictions one step at a time, so that when a new observation is recorded – a new prediction is generated based on said observation.
Here is my naive implementation:
import pandas as pd
import numpy as np
import os
from statsmodels.tsa.arima_process import arma_generate_sample
from statsforecast import StatsForecast
from statsforecast.models import AutoARIMA
os.environ['NIXTLA_ID_AS_COL'] = '1'
n = 100
## Generate synthetic AR(3) data.
ar3 = arma_generate_sample([1, 0.5, 0.5, 0.5], [1], n)
df = pd.DataFrame(data=ar3, columns=['y']).assign(unique_id=0)
df['ds'] = pd.date_range('01-01-2020', periods=ar3.size, freq='D')
## Instantiate a StatsForecast object
sf = StatsForecast(
models = [AutoARIMA(season_length = 1)],
freq = 'D'
)
res = []
## Generate sequential predictions
for k in range(n//2, n):
## In this cycle we pretend we have k observations
train = df.iloc[:k]
## Generate a prediction from k observations
pred = sf.forecast(df=train, h=1).loc[0, 'AutoARIMA']
## Collect data
time = train.ds.max() + pd.Timedelta('1D')
truth = df[df.ds == time].y.values[0]
res.append(dict(pred=pred, time=time, truth=truth))
res = pd.DataFrame(res).set_index('time')
res.plot()
Problem: in my implementation, the StatsForecast
object is repeatedly optimizing the order of the ARIMA model, which is time consuming.
I would like to be able to:
- Train AutoARIMA once, freeze the underlying ARIMA model and use that to make a prediction every time a new observation is collected (i.e. every cycle).
- Find optimal ARIMA order once, fit model coefficients every time a new observation is collected, then make prediction.
I can get the order via
from statsforecast.arima import arima_string
arima_string(sf.fit(train).fitted_[0,0].model_)
Then input the parameters into a statsmodels.tsa.arima.model.ARIMA
object. This seems like quite a convluted approach quite cumbersome. Even then, I will likely make mistakes translating the order from Nixtla to statsmodels
. There has to be a better way.