I am a really new R programming user and I am trying to figure this problem out. I am trying to use a loop to go through a list of tables that I imported and am trying to clean. I am trying to set 4 of the columns to as.numeric() because when the data was in .csv format some of the cells were in as character format. I have tried to look up the error itself which is the following:
Error in eval(substitute(list(…)),
_data
, parent.frame()) :
object ‘start_lat’ not found
However the answers online to a search lead me to questions regarding predicting variables or other issues with statistical analysis. The code is as follows:
#' create separate tables to clean so original imported data isn't affected
#' in case of errors while cleaning.
tdcl_202307<- td_202307
tdcl_202308<- td_202308
tdcl_202309<- td_202309
tdcl_202310<- td_202310
tdcl_202311<- td_202311
tdcl_202312<- td_202312
tdcl_202401<- td_202401
tdcl_202402<- td_202402
tdcl_202403<- td_202403
tdcl_202404<- td_202404
tdcl_202405<- te_202405
tdcl_202406<- td_202406
#' create variable 'tblList' that formats so a loop can be run to cycle through
#' all of the tables and clean data simultaneously.
tblList<- list(tdcl_202307, tdcl_202308, tdcl_202309, tdcl_202310, tdcl_202311,
tdcl_202312, tdcl_202401, tdcl_202402, tdcl_202403, tdcl_202404,
tdcl_202405, tdcl_202406)
#' set a 'for' loop that will cycle through and change columns 'start_lat',
#' 'start_lng', 'end_lat', and 'end_lng'so that all values will be set to
#' 'as.numeric()'
for (i in 1:12) {
tbl <- tblList[[i]]
transform(i, start_lat = as.numeric(start_lat),start_lng =
as.numeric(start_lng),end_lat = as.numeric(end_lat), end_lng =
as.numeric(end_lng)
)
}
I’ve tried moving around the loop and tried figuring out what the object issue is, but like I said, I am a new user to R and do not have any idea what I am doing with this. I think there are probably fundamental issues with my understanding of what the coding is doing. I expect the loop to go through each table and correct the data in each column specified. All the tables have the same column name, however not all of the columns have the same data. For purposes of my project I am trying to keep the tables separate, although potentially I could have combined into one master table.
Here is a example of my data, the result of dput(head(td_202307))
:
structure(list(ride_id = c("9340B064F0AEE130", "D1460EE3CE0D8AF8",
"DF41BE31B895A25E", "9624A293749EF703", "2F68A6A4CDB4C99A", "9AEE973E6B941A9C"
), rideable_type = c("electric_bike", "classic_bike", "classic_bike",
"electric_bike", "classic_bike", "classic_bike"), started_at = structure(c(1690142774,
1690131907, 1690107293, 1689928064, 1688831202, 1688978687), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), ended_at = structure(c(1690143764,
1690132717, 1690107869, 1689928360, 1688831888, 1688978981), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), start_station_name = c("Kedzie Ave & 110th St",
"Western Ave & Walton St", "Western Ave & Walton St", "Racine Ave & Randolph St",
"Clark St & Leland Ave", "Racine Ave & Randolph St"), start_station_id = c("20204",
"KA1504000103", "KA1504000103", "13155", "TA1309000014", "13155"
), end_station_name = c("Public Rack - Racine Ave & 109th Pl",
"Milwaukee Ave & Grand Ave", "Damen Ave & Pierce Ave", "Clinton St & Madison St",
"Montrose Harbor", "Sangamon St & Lake St"), end_station_id = c("877",
"13033", "TA1305000041", "TA1305000032", "TA1308000012", "TA1306000015"
), start_lat = c(41.692406058, 41.89841768945, 41.89841768945,
41.8841121666667, 41.9670878391842, 41.884069), start_lng = c(-87.700905323,
-87.6865960164, -87.6865960164, -87.6569435, -87.667290866375,
-87.656853), end_lat = c(41.694835, 41.891578, 41.9093960065,
41.8827519656856, 41.963982, 41.8857792524043), end_lng = c(-87.653041,
-87.648384, -87.6776919292, -87.641190290451, -87.638181, -87.6510246098041
), member_casual = c("member", "member", "member", "member",
"member", "member")), spec = structure(list(cols = list(ride_id = structure(list(), class = c("collector_character",
"collector")), rideable_type = structure(list(), class = c("collector_character",
"collector")), started_at = structure(list(format = ""), class = c("collector_datetime",
"collector")), ended_at = structure(list(format = ""), class = c("collector_datetime",
"collector")), start_station_name = structure(list(), class = c("collector_character",
"collector")), start_station_id = structure(list(), class = c("collector_character",
"collector")), end_station_name = structure(list(), class = c("collector_character",
"collector")), end_station_id = structure(list(), class = c("collector_character",
"collector")), start_lat = structure(list(), class = c("collector_double",
"collector")), start_lng = structure(list(), class = c("collector_double",
"collector")), end_lat = structure(list(), class = c("collector_double",
"collector")), end_lng = structure(list(), class = c("collector_double",
"collector")), member_casual = structure(list(), class = c("collector_character",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), delim = ","), class = "col_spec"), problems = <pointer: (nil)>, row.names = c(NA,
6L), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"))
Bryce is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
2
Couple of issues with your code:
transform(i, ...)
is trying to change the class types of columns in an integer (which they don’t have obviously 🙂 ) and not the tbl dataframe you create inside your loop- you need to apply
transform()
to the elements in tblList. Currently your code would not make any changes permanent
Also, no need for a for()
loop, it is more efficient to use something like lapply()
. I have included both options below:
# Your example data
td_202307 <- structure(list(ride_id = c("9340B064F0AEE130", "D1460EE3CE0D8AF8",
"DF41BE31B895A25E", "9624A293749EF703", "2F68A6A4CDB4C99A", "9AEE973E6B941A9C"
), rideable_type = c("electric_bike", "classic_bike", "classic_bike",
"electric_bike", "classic_bike", "classic_bike"), started_at = structure(c(1690142774,
1690131907, 1690107293, 1689928064, 1688831202, 1688978687), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), ended_at = structure(c(1690143764,
1690132717, 1690107869, 1689928360, 1688831888, 1688978981), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), start_station_name = c("Kedzie Ave & 110th St",
"Western Ave & Walton St", "Western Ave & Walton St", "Racine Ave & Randolph St",
"Clark St & Leland Ave", "Racine Ave & Randolph St"), start_station_id = c("20204",
"KA1504000103", "KA1504000103", "13155", "TA1309000014", "13155"
), end_station_name = c("Public Rack - Racine Ave & 109th Pl",
"Milwaukee Ave & Grand Ave", "Damen Ave & Pierce Ave", "Clinton St & Madison St",
"Montrose Harbor", "Sangamon St & Lake St"), end_station_id = c("877",
"13033", "TA1305000041", "TA1305000032", "TA1308000012", "TA1306000015"
), start_lat = c(41.692406058, 41.89841768945, 41.89841768945,
41.8841121666667, 41.9670878391842, 41.884069), start_lng = c(-87.700905323,
-87.6865960164, -87.6865960164, -87.6569435, -87.667290866375,
-87.656853), end_lat = c(41.694835, 41.891578, 41.9093960065,
41.8827519656856, 41.963982, 41.8857792524043), end_lng = c(-87.653041,
-87.648384, -87.6776919292, -87.641190290451, -87.638181, -87.6510246098041
), member_casual = c("member", "member", "member", "member",
"member", "member")), spec = structure(list(cols = list(ride_id = structure(list(), class = c("collector_character",
"collector")), rideable_type = structure(list(), class = c("collector_character",
"collector")), started_at = structure(list(format = ""), class = c("collector_datetime",
"collector")), ended_at = structure(list(format = ""), class = c("collector_datetime",
"collector")), start_station_name = structure(list(), class = c("collector_character",
"collector")), start_station_id = structure(list(), class = c("collector_character",
"collector")), end_station_name = structure(list(), class = c("collector_character",
"collector")), end_station_id = structure(list(), class = c("collector_character",
"collector")), start_lat = structure(list(), class = c("collector_double",
"collector")), start_lng = structure(list(), class = c("collector_double",
"collector")), end_lat = structure(list(), class = c("collector_double",
"collector")), end_lng = structure(list(), class = c("collector_double",
"collector")), member_casual = structure(list(), class = c("collector_character",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), delim = ","), class = "col_spec"), row.names = c(NA,
6L), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"))
# Convert columns to as.character() to replicate the issue you described
td_202307 <- transform(td_202307,
start_lat = as.character(start_lat),
start_lng = as.character(start_lng),
end_lat = as.character(end_lat),
end_lng = as.character(end_lng))
# Make example dataframes
tdcl_202307 <- td_202307
tdcl_202308 <- td_202307
# Create list
tblList <- list(tdcl_202307, tdcl_202308)
# Check class types of first list element (for comparison)
str(tblList[[1]][9:12])
# 'data.frame': 6 obs. of 4 variables:
# $ start_lat: chr "41.692406058" "41.89841768945" "41.89841768945" "41.8841121666667" ...
# $ start_lng: chr "-87.700905323" "-87.6865960164" "-87.6865960164" "-87.6569435" ...
# $ end_lat : chr "41.694835" "41.891578" "41.9093960065" "41.8827519656856" ...
# $ end_lng : chr "-87.653041" "-87.648384" "-87.6776919292" "-87.641190290451" ...
# Corrected loop
for (i in 1:length(tblList)) {
tblList[[i]] <- transform(tblList[[i]],
start_lat = as.numeric(start_lat),
start_lng = as.numeric(start_lng),
end_lat = as.numeric(end_lat),
end_lng = as.numeric(end_lng))
}
str(tblList[[1]][9:12])
# 'data.frame': 6 obs. of 4 variables:
# $ start_lat: num 41.7 41.9 41.9 41.9 42 ...
# $ start_lng: num -87.7 -87.7 -87.7 -87.7 -87.7 ...
# $ end_lat : num 41.7 41.9 41.9 41.9 42 ...
# $ end_lng : num -87.7 -87.6 -87.7 -87.6 -87.6 ...
Using the more-efficient lapply()
option with a custom function:
f <- function(x) {
transform(x,
start_lat = as.numeric(start_lat),
start_lng = as.numeric(start_lng),
end_lat = as.numeric(end_lat),
end_lng = as.numeric(end_lng))
}
# Apply f function to each element in tblList
tblList <- lapply(tblList, f)