I have a panel with regions and years. My main variable of interest, index
, is missing for some regions in some years. I am trying to imput those missing values.
To do so, I’ve calculated the percentile of index and I have calculated its mean for each region within each year. For example:
list region_id year index perc mean_perc, nol
+---------------------------------------------------+
| region~d year index perc mean_p~c |
|---------------------------------------------------|
1. | 1 1990 -.0879496 .6528497 .2710086 |
2. | 1 1991 -.4667637 .0351759 .2710086 |
3. | 1 1992 -.1709576 .125 .2710086 |
4. | 1 1993 . . .2710086 |
5. | 2 1990 -.462625 .1398964 .3006104 |
|---------------------------------------------------|
6. | 2 1991 -.0563047 .3869347 .3006104 |
7. | 2 1992 .1408911 .375 .3006104 |
8. | 2 1993 . . .3006104 |
9. | 3 1990 -.3460146 .2746114 .3954145 |
10. | 3 1991 -.0690994 .3718593 .3954145 |
|---------------------------------------------------|
11. | 3 1992 .3073938 .5397727 .3954145 |
12. | 3 1993 . . .3954145 |
13. | 4 1990 -.6537067 .0259067 .125898 |
14. | 4 1991 -.1824378 .2211055 .125898 |
15. | 4 1992 -.1649489 .1306818 .125898 |
|---------------------------------------------------|
16. | 4 1993 . . .125898 |
17. | 5 1990 -.5772001 .0518135 .1571086 |
18. | 5 1991 -.0987434 .3115578 .1571086 |
19. | 5 1992 -.1815233 .1079545 .1571086 |
20. | 5 1993 . . .1571086 |
|---------------------------------------------------|
21. | 6 1990 -.5690967 .0673575 .207459 |
22. | 6 1991 -.1751851 .2311558 .207459 |
23. | 6 1992 .0894242 .3238636 .207459 |
24. | 6 1993 . . .207459 |
25. | 7 1990 -.6956265 .015544 .139356 |
|---------------------------------------------------|
26. | 7 1991 -.4338617 .0502513 .139356 |
27. | 7 1992 .1256393 .3522727 .139356 |
28. | 7 1993 . . .139356 |
29. | 8 1990 -.6792535 .0207254 .2363321 |
30. | 8 1991 -.026226 .4723618 .2363321 |
|---------------------------------------------------|
31. | 8 1992 -.058143 .2159091 .2363321 |
32. | 8 1993 . . .2363321 |
+---------------------------------------------------+
. sum index perc mean_perc
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
index | 24 -.2288466 .2823488 -.6956265 .3073938
perc | 24 .2291484 .1788557 .015544 .6528497
mean_perc | 32 .2291484 .0872065 .125898 .3954145
I want to create a variable (x
) that contains the quantile value of index corresponding to the mean percentile (mean_perc
) for each region within each year. That is, x
should contain the value of index
that corresponds to the percentile in mean_perc, so, for instance, if mean_perc
= 0.5, x
should indicate what value of index
is at the median; if mean_perc
= 0.25, x
should indicate what value of index
would represent the 25th percentile.
I know that in R, this can be achieved using this command:
data <- within(data, imputed <- quantile(index, c(mean_perc), na.rm = TRUE))
but I’m trying to find the way to achieve this in STATA.
Thanks!
I have tried using the xtile
command and the pctile
command, but I haven’t been able to figure it out.