I’m working on a ML project. I’m trying to impute a few values of a time series of Afghanistan. To be more specific, the series representes the percentage of people that has access to electricity.
This is the code I’m using:
library(fable)
library(tsibble)
data3 = data2%>% as_tsibble(index = Year ,key = Country)
afg<-data3[data3['Country']=='Afghanistan',]
df_impute= afg%>%
model(TSLM(V1 ~ trend())) %>%
interpolate(afg)
It does work with no issues at all, but the problem is that in some cases i get absurd values, for example:
# A tsibble: 6 x 3 [1Y]
# Key: Country [1]
Country Year V1
<fct> <int> <dbl>
1 Afghanistan 1960 -195.
2 Afghanistan 1961 -190.
3 Afghanistan 1962 -185.
4 Afghanistan 1963 -180.
5 Afghanistan 1964 -175.
6 Afghanistan 1965 -170.
or like the one for 2022:
# A tsibble: 6 x 3 [1Y]
# Key: Country [1]
Country Year V1
<fct> <int> <dbl>
1 Afghanistan 2017 97.7
2 Afghanistan 2018 93.4
3 Afghanistan 2019 97.7
4 Afghanistan 2020 97.7
5 Afghanistan 2021 97.7
6 Afghanistan 2022 113.
The only solution I found is this one:
df_impute= afg%>%
model(TSLM(V1 ~ trend())) %>%
interpolate(afg)%>%
mutate(V1 = pmax(pmin(V1, 100), 0))
but I don’t think is the right solution, because I feel like this is a wrong way to impute.
Is there a way to specify the ranges?