I have the following dataset which has the number of A, B and C both in 2000 and 2005. I now need to inpolate the dataframe subject to the constraint that the sum of A, B and C must be consistent with the Total amount, which is available for every year. Since I am dealing with population data, the values have to be integers.
data <- data.frame( Year = c(2000, 2001, 2002, 2003, 2004, 2005),
Total = c(50, 52, 53, 57, 60, 61),
A = c(12, NA, NA, NA, NA, 17),
B = c(22, NA, NA, NA, NA, 24),
C = c(16, NA, NA, NA, NA, 20) )
So far, I tried using na.approx()
from the zoo
package. But I was unable to make the “Total”-constraint work.
Thank you very much for your help.
2
You could interpolate each one individually, as you did using zoo::na.approx
, and then use iterative proportional fitting (library ipfr) followed by optimized rounding using the roundfixS
function from the sfsmisc package.
Interpolation
mtx <- matrix(data = c(zoo::na.approx(df$A)[2:5],
zoo::na.approx(df$B)[2:5],
zoo::na.approx(df$C)[2:5]), nrow = 4, ncol = 3)
addmargins(mtx)
Sum
13 22.4 16.8 52.2
14 22.8 17.6 54.4
15 23.2 18.4 56.6
16 23.6 19.2 58.8
Sum 58 92.0 72.0 222.0
Note the row sums don’t agree with your constraint.
Iterative proportional fitting
library(ipfr)
row_targets <- df$Total[2:5]
column_targets <- colSums(mtx)
X <- ipu_matrix(mtx, row_targets, column_targets)
addmargins(X)
Sum
12.94506 22.32052 16.73442 52
13.63439 22.21969 17.14592 53
15.10020 23.37088 18.52891 57
16.32035 24.08890 19.59075 60
Sum 58.00000 92.00000 72.00000 222
Row totals now agree with your constraints, but the numbers inside are not integers.
Optimized rounding
library(sfsmisc)
X_round <- roundfixS(X)
addmargins(X_round)
Sum
13 22 17 52
14 22 17 53
15 23 19 57
16 24 20 60
Sum 58 91 73 222
2
You could first linearly interpolate columns using approx
, then use proportions(., margin=1)
, multiply with the totals and round
to integers.
> (proportions(sapply(lapply(data[, 3:5], approx, xout=seq_len(nrow(data))),
+ `[[`, "y"), margin=1)*data$Total) |>
+ round() |>
+ `mode<-`('integer') ## optional
A B C
[1,] 12 22 16
[2,] 13 22 17
[3,] 14 22 17
[4,] 15 23 19
[5,] 16 24 20
[6,] 17 24 20
3