I’m doing a full merge of 3 data sets
I have 3 data sets to merge, I’ve done a full merge as I want them all to combine with each other rather than bind rows which just stacks them.
data.frame': 228 obs. of 4 variables:
$ location : Factor w/ 135 levels "Afghanistan",..: 1 2 3 4 5 6 7 8 9 10 ...
$ lockdown_date: Date, format: "2020-03-24" "2020-03-08" "2020-03-24" "2020-03-16" ...
$ Type : chr "Full" "Full" "Full" "Full" ...
$ Reference : chr "https://www.thestatesman.com/world/afghan-govt-imposes-lockdown-coronavirus-cases-increase-15-1502870945.html" "https://en.wikipedia.org/wiki/2020_coronavirus_pandemic_in_Albania" "https://www.garda.com/crisis24/news-alerts/325896/algeria-government-implements-lockdown-and-curfew-in-blida-an"| __truncated__ "https://en.wikipedia.org/wiki/2020_coronavirus_pandemic_in_Andorra"
<code>str(country)
data.frame': 228 obs. of 4 variables:
$ location : Factor w/ 135 levels "Afghanistan",..: 1 2 3 4 5 6 7 8 9 10 ...
$ lockdown_date: Date, format: "2020-03-24" "2020-03-08" "2020-03-24" "2020-03-16" ...
$ Type : chr "Full" "Full" "Full" "Full" ...
$ Reference : chr "https://www.thestatesman.com/world/afghan-govt-imposes-lockdown-coronavirus-cases-increase-15-1502870945.html" "https://en.wikipedia.org/wiki/2020_coronavirus_pandemic_in_Albania" "https://www.garda.com/crisis24/news-alerts/325896/algeria-government-implements-lockdown-and-curfew-in-blida-an"| __truncated__ "https://en.wikipedia.org/wiki/2020_coronavirus_pandemic_in_Andorra"
</code>
str(country)
data.frame': 228 obs. of 4 variables:
$ location : Factor w/ 135 levels "Afghanistan",..: 1 2 3 4 5 6 7 8 9 10 ...
$ lockdown_date: Date, format: "2020-03-24" "2020-03-08" "2020-03-24" "2020-03-16" ...
$ Type : chr "Full" "Full" "Full" "Full" ...
$ Reference : chr "https://www.thestatesman.com/world/afghan-govt-imposes-lockdown-coronavirus-cases-increase-15-1502870945.html" "https://en.wikipedia.org/wiki/2020_coronavirus_pandemic_in_Albania" "https://www.garda.com/crisis24/news-alerts/325896/algeria-government-implements-lockdown-and-curfew-in-blida-an"| __truncated__ "https://en.wikipedia.org/wiki/2020_coronavirus_pandemic_in_Andorra"
data.frame': 1575 obs. of 8 variables:
$ location : chr "Australia" "Australia" "Australia" "Australia" ...
$ date : chr "2019-12-31" "2020-01-01" "2020-01-02" "2020-01-03" ...
$ total_cases : int 0 0 0 0 0 0 0 0 0 0 ...
$ new_cases : int 0 0 0 0 0 0 0 0 0 0 ...
$ total_deaths : num 0 0 0 0 0 0 0 0 0 0 ...
$ new_deaths : num 0 0 0 0 0 0 0 0 0 0 ...
$ gdp_per_capita: num 44649 44649 44649 44649 44649 ...
$ population : int 25499881 25499881 25499881 25499881 25499881 25499881 2 38 33
<code>Str(covid)
data.frame': 1575 obs. of 8 variables:
$ location : chr "Australia" "Australia" "Australia" "Australia" ...
$ date : chr "2019-12-31" "2020-01-01" "2020-01-02" "2020-01-03" ...
$ total_cases : int 0 0 0 0 0 0 0 0 0 0 ...
$ new_cases : int 0 0 0 0 0 0 0 0 0 0 ...
$ total_deaths : num 0 0 0 0 0 0 0 0 0 0 ...
$ new_deaths : num 0 0 0 0 0 0 0 0 0 0 ...
$ gdp_per_capita: num 44649 44649 44649 44649 44649 ...
$ population : int 25499881 25499881 25499881 25499881 25499881 25499881 2 38 33
</code>
Str(covid)
data.frame': 1575 obs. of 8 variables:
$ location : chr "Australia" "Australia" "Australia" "Australia" ...
$ date : chr "2019-12-31" "2020-01-01" "2020-01-02" "2020-01-03" ...
$ total_cases : int 0 0 0 0 0 0 0 0 0 0 ...
$ new_cases : int 0 0 0 0 0 0 0 0 0 0 ...
$ total_deaths : num 0 0 0 0 0 0 0 0 0 0 ...
$ new_deaths : num 0 0 0 0 0 0 0 0 0 0 ...
$ gdp_per_capita: num 44649 44649 44649 44649 44649 ...
$ population : int 25499881 25499881 25499881 25499881 25499881 25499881 2 38 33
data.frame': 187 obs. of 5 variables:
$ location : Factor w/ 187 levels "Afghanistan",..: 1 2 3 4 5 6 7 8 9 10 ...
$ Doses.administered.per.100.people: int 17 102 35 64 237 73 162 229 207 137 ...
$ total_doses_administered : num 6.45e+06 2.91e+06 1.52e+07 2.04e+07 1.06e+08 ...
$ X..of.population.vaccinated : num 15 46 19 41 92 38 84 88 77 53 ...
$ perc_pop_vaccinated : num 13 44 16 22 84 33 78 86 75 48 ...
<code> str(vaccination)
data.frame': 187 obs. of 5 variables:
$ location : Factor w/ 187 levels "Afghanistan",..: 1 2 3 4 5 6 7 8 9 10 ...
$ Doses.administered.per.100.people: int 17 102 35 64 237 73 162 229 207 137 ...
$ total_doses_administered : num 6.45e+06 2.91e+06 1.52e+07 2.04e+07 1.06e+08 ...
$ X..of.population.vaccinated : num 15 46 19 41 92 38 84 88 77 53 ...
$ perc_pop_vaccinated : num 13 44 16 22 84 33 78 86 75 48 ...
</code>
str(vaccination)
data.frame': 187 obs. of 5 variables:
$ location : Factor w/ 187 levels "Afghanistan",..: 1 2 3 4 5 6 7 8 9 10 ...
$ Doses.administered.per.100.people: int 17 102 35 64 237 73 162 229 207 137 ...
$ total_doses_administered : num 6.45e+06 2.91e+06 1.52e+07 2.04e+07 1.06e+08 ...
$ X..of.population.vaccinated : num 15 46 19 41 92 38 84 88 77 53 ...
$ perc_pop_vaccinated : num 13 44 16 22 84 33 78 86 75 48 ...
<code> str(df) Merged dataframe after using distinct
'data.frame': 5811 obs. of 11 variables:
$ location : Factor w/ 202 levels "Afghanistan",..: 1 2 3 4 5 6 7 8 8 8 ...
$ date : chr NA NA NA NA ...
$ total_cases : int NA NA NA NA NA NA NA 4 4 0 ...
$ new_cases : int NA NA NA NA NA NA NA 0 0 0 ...
$ total_deaths : num NA NA NA NA NA NA NA 0 0 0 ...
$ new_deaths : num NA NA NA NA NA NA NA 0 0 0 ...
$ gdp_per_capita : num NA NA NA NA NA ...
$ population : int NA NA NA NA NA NA NA 25499881 25499881 25499881 ...
$ lockdown_date : Date, format: "2020-03-24" "2020-03-08" "2020-03-24" "2020-03-16" ...
$ total_doses_administered: num 6445359 2906126 15205854 NA 20397115 ...
$ perc_pop_vaccinated : num 13 44 16 NA 22 84 33 86 86 86 ...
<code> str(df) Merged dataframe after using distinct
'data.frame': 5811 obs. of 11 variables:
$ location : Factor w/ 202 levels "Afghanistan",..: 1 2 3 4 5 6 7 8 8 8 ...
$ date : chr NA NA NA NA ...
$ total_cases : int NA NA NA NA NA NA NA 4 4 0 ...
$ new_cases : int NA NA NA NA NA NA NA 0 0 0 ...
$ total_deaths : num NA NA NA NA NA NA NA 0 0 0 ...
$ new_deaths : num NA NA NA NA NA NA NA 0 0 0 ...
$ gdp_per_capita : num NA NA NA NA NA ...
$ population : int NA NA NA NA NA NA NA 25499881 25499881 25499881 ...
$ lockdown_date : Date, format: "2020-03-24" "2020-03-08" "2020-03-24" "2020-03-16" ...
$ total_doses_administered: num 6445359 2906126 15205854 NA 20397115 ...
$ perc_pop_vaccinated : num 13 44 16 NA 22 84 33 86 86 86 ...
</code>
str(df) Merged dataframe after using distinct
'data.frame': 5811 obs. of 11 variables:
$ location : Factor w/ 202 levels "Afghanistan",..: 1 2 3 4 5 6 7 8 8 8 ...
$ date : chr NA NA NA NA ...
$ total_cases : int NA NA NA NA NA NA NA 4 4 0 ...
$ new_cases : int NA NA NA NA NA NA NA 0 0 0 ...
$ total_deaths : num NA NA NA NA NA NA NA 0 0 0 ...
$ new_deaths : num NA NA NA NA NA NA NA 0 0 0 ...
$ gdp_per_capita : num NA NA NA NA NA ...
$ population : int NA NA NA NA NA NA NA 25499881 25499881 25499881 ...
$ lockdown_date : Date, format: "2020-03-24" "2020-03-08" "2020-03-24" "2020-03-16" ...
$ total_doses_administered: num 6445359 2906126 15205854 NA 20397115 ...
$ perc_pop_vaccinated : num 13 44 16 NA 22 84 33 86 86 86 ...
Some dates in the date column are being repeated due to the lockdown_date column having multiple lockdown dates per country. For example china 31/12/19 is repeated 3 times in the df dataframe, whereas in the covid data frame it only appears once. How do i merge these dataframes better?
this is my merge
<code>df <- merge(country, covid, by = "location", all = TRUE)
df <- merge(df, vaccination, by = "location", all = TRUE)
<code>df <- merge(country, covid, by = "location", all = TRUE)
df <- merge(df, vaccination, by = "location", all = TRUE)
</code>
df <- merge(country, covid, by = "location", all = TRUE)
df <- merge(df, vaccination, by = "location", all = TRUE)
there should be a total of 1990 rows, but once merged it creates 5811 even after using distinct