I’m quite new to programmin language and I am starting with R in my research predicting dengue desease cases with climatic data.
I’m still cleaning my data to work with and this particular one has around 172.855 obs and 17 variables, for each of the 23 files.
So I want to keep only the obs. and variables I need to use (which are the date, municipality and quantity of cases registered), but I wanted to create a way to do it automatically so I don’t need to keep doing it to all of them, but I didn’t quite understand how to do it using a loop from purrr our lapply.
Could anyone help me with this?
Example of the dataset with cases of infections
What I wrote so far are this 3 lines and they are basically what I need to keep only what I want.
(The files names are: dengue00, dengue01, dengue02,…,dengue23)
#Library
library(tidyr)
#This line is for keeping only the three columns I need out of the 17 the documents have.
dengue00 <- subset(dengue00, select = c(1, 3, 8)
#Here is to keep only the municipality I’ll use in the obs.
dengue00 <- dengue00[deng00$ID_MUNICIP %in% c(3550308),]
#And this one just to simplify the column names
dengue00<- dengue00 %>%
rename(mn= ID_MUNICIP , dt = DT_NOTIFIC , uf = SG_UF_NOT )
The ideia is to end up like like this
Thank you so much for any help.
André Ferrari is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.