I’m exploring navigating dataframes with a specific example dataframe:
planets_df <- data.frame(
name = c("Mercury", "Venus", "Earth", "Mars", "Jupiter", "Saturn", "Uranus", "Neptune"),
type = c("Terrestrial planet", "Terrestrial planet", "Terrestrial planet", "Terrestrial planet", "Gas giant", "Gas giant", "Gas giant", "Gas giant"),
diameter = c(0.382, 0.949, 1.000, 0.532, 11.209, 9.449, 4.007, 3.883),
rotation = c(58.64, -243.02, 1.00, 1.03, 0.41, 0.43, -0.72, 0.67),
rings = c(FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE)
)
This code would output something like:
name type diameter rotation rings
1 Mercury Terrestrial planet 0.382 58.64 FALSE
2 Venus Terrestrial planet 0.949 -243.02 FALSE
3 Earth Terrestrial planet 1.000 1.00 FALSE
4 Mars Terrestrial planet 0.532 1.03 FALSE
5 Jupiter Gas giant 11.209 0.41 TRUE
6 Saturn Gas giant 9.449 0.43 TRUE
7 Uranus Gas giant 4.007 -0.72 TRUE
8 Neptune Gas giant 3.883 0.67 TRUE
Now here’s the main issue that I’m trying to figure out. To better understand how row and column selection syntax works in R, I tried two test cases, one where I select rows from “rotation” and another where I select from the logical “rings” column.
Let me explain the latter first, I tried executing this code block, and it works as expected, outputting rows with rings as TRUE.
planets_df[planets_df$rings, TRUE]
This is the output, which shows that the code works as intended.
name type diameter rotation rings
5 Jupiter Gas giant 11.209 0.41 TRUE
6 Saturn Gas giant 9.449 0.43 TRUE
7 Uranus Gas giant 4.007 -0.72 TRUE
8 Neptune Gas giant 3.883 0.67 TRUE
I’m not quite sure why it works in the example above because it’s subverting the idea of indexes being structured as [row, column]. Assuming that the column rings is treated as a logical vector, I’m also not quite sure why logical vectors use their own values as indices that can be directly looked up, because you wouldn’t be able to use exact syntax like this with numeric columns!
Now, if I try a similar approach using a numerical value, I understand that it wouldn’t work, for example( both incorrect):
planets_df[planets_df$rotation, 0.41]
# or
planets_df[planets_df$rotation, rotation == 0.41]
Instead I’d have to try a different approach, like:
planets_df[planets_df$rotation == 0.41, ]
I find it a bit strange that I’m using syntax that’s so different to perform a similar filtering task. If I’m understanding this line of code correctly, it searches the rotation column for an index that contains the value 0.41. This query seems reasonable, but is a bit odd too, wouldn’t there be a way to write it so that the [ , column] field is also being used in the filtering code line?
I think I have a decent idea of the basic numerical [a, b] system of choosing data-frame indexes, but it’s when I’m using indexes in a more abstract manner that’s a bit confusing for me in terms of dataframes.
I wanted to understand the logic behind why the rings column, as a logical vector is so unique that it’s values can be used as indices in that specific order, and why my first test case with the logical vector works the way it does. When I’m learning syntax I like to work with the assumption that there’s a proper set of “rules” or a pattern to follow, so I also want to understand why the syntax conventions for my first case with the logical vector works so differently compared to the numeric case.