I have a panel dateset on the distict level from Germany with three different years (2013, 2017, 2021). I want to lag one of my variables but the lag
function form R
only provides the same variable again from the same period which is not lagged.
I am confused. Why does the lag
function not work as it should and how can I fix it?
This is the code that I used and the output that it generated:
library(tidyverse)
> data %>%
+ select(Kennziffer, Jahr, Kreis, per_foreign_noeduc) %>%
+ arrange(Kennziffer, Jahr) %>%
+ group_by(Kennziffer) %>%
+ mutate(lagged_per_foreign_noeduc = lag(per_foreign_noeduc, n = 1, default = NA))
# A tibble: 1,200 × 5
# Groups: Kennziffer [400]
Kennziffer Jahr Kreis per_foreign_noeduc lagged_per_foreign_noeduc
<dbl> <dbl> <chr> <dbl> <dbl>
1 2 2013 Hamburg 1.78 1.78
2 2 2017 Hamburg 2.45 2.45
3 2 2021 Hamburg 3.19 3.19
4 11 2013 Berlin 1.44 1.44
5 11 2017 Berlin 2.30 2.30
6 11 2021 Berlin 2.88 2.88
7 1001 2013 Flensburg, kreisfreie Stadt 0.820 0.820
8 1001 2017 Flensburg, kreisfreie Stadt 1.46 1.46
9 1001 2021 Flensburg, kreisfreie Stadt 2.25 2.25
10 1002 2013 Kiel, Landeshauptstadt, kreisfreie Stadt 0.761 0.761
# ℹ 1,190 more rows
# ℹ Use `print(n = ...)` to see more rows
3
Please provide reproducible data as discussed at the top of the r tag page. We have attempted to provide data
in the Note at the end and dplyr::lag
works as expected.
Note that base R lag
works differently – it expects a ts or other time series class whereas dplyr lag
works with a column in a data.frame so you might want to use dplyr::lag
to be sure you are using the dplyr one although normally that is not needed.
library(dplyr)
data %>%
group_by(Kennziffer) %>%
mutate(lagged_per_foreign_noeduc = lag(per_foreign_noeduc, n = 1, default = NA))
giving
# A tibble: 10 × 5
# Groups: Kennziffer [4]
Kennziffer Jahr Kreis per_foreign_noeduc lagged_per_foreign_n…¹
<int> <int> <chr> <dbl> <dbl>
1 2 2013 Hamburg 1.78 NA
2 2 2017 Hamburg 2.45 1.78
3 2 2021 Hamburg 3.19 2.45
4 11 2013 Berlin 1.44 NA
5 11 2017 Berlin 2.3 1.44
6 11 2021 Berlin 2.88 2.3
7 1001 2013 Flensburg, kreisf… 0.82 NA
8 1001 2017 Flensburg, kreisf… 1.46 0.82
9 1001 2021 Flensburg, kreisf… 2.25 1.46
10 1002 2013 Kiel, Landeshaupt… 0.761 NA
# ℹ abbreviated name: ¹lagged_per_foreign_noeduc
Note
data <- data.frame(
Kennziffer = rep(c(2L, 11L, 1001L, 1002L), c(3L, 3L, 3L, 1L)),
Jahr = c(2013L, 2017L, 2021L, 2013L, 2017L, 2021L, 2013L, 2017L, 2021L, 2013L),
Kreis = rep(c("Hamburg", "Berlin", "Flensburg, kreisfreie Stadt",
"Kiel, Landeshauptstadt, kreisfreie Stadt"), c(3L, 3L, 3L, 1L)),
per_foreign_noeduc = c(1.78, 2.45, 3.19, 1.44, 2.3, 2.88, 0.82, 1.46, 2.25, 0.761)
)
1