I’m having trouble grasping the purpose of .group = "drop"
in dplyr’s summarise
function. I’m attempting to execute the following code to display the top 20 stations along with their respective latitude and longitude:
summary <- trips_2023 %>%
filter(member_casual == "member") %>%
group_by(start_station_name, start_lat, start_lng) %>%
summarise(count = n()) %>%
arrange(desc(count)) %>%
mutate(type = "start",
member = "member") %>%
slice(1:20)
However, the code returns a table with many more rows than the requested 20.
If I add .groups = "drop"
after summarise
, then the code works, but honestly, I haven’t understood why.
summary <- trips_2023 %>%
filter(member_casual == "member") %>%
group_by(start_station_name, start_lat, start_lng) %>%
summarise(count = n(), .groups = "drop") %>%
arrange(desc(count)) %>%
mutate(type = "start",
member = "member") %>%
slice(1:20)
‘.groups = “drop”‘ supposedly instructs dplyr to remove group information after performing grouping and summarising operations.
However, the definition is not clear to me. I have also read the official documentation, but it’s not very understandable.
Can someone help me understand better with a practical example?