I’m trying to work through a problem that I keep encountering in R, where when I try to create a side-by-side boxplot as well as a scatterplot of three variables. I am using the dataset “Boston” located in the ISLR2 package, and I am unsure why the graphs look so weird. For the side-by-side boxplot, I am creating the plot ‘medv’ by two variables, ‘cat_chas’ and ‘cat_rm’, which are both direct mutations of the variables ‘chas’ and ‘rm’ in the Boston dataset. For the scatterplot, I am using the variable ‘age’ on the horizontal axis and ‘medv’ on the vertical axis, with the points being colored by the variable ‘cat_rm’. Is it something simple I am making a mistake on?
Side-by-side Boxplot
library(ISLR2)
library(dplyr)
library(tidyverse)
data = data.frame(Boston)
data <- mutate(data, cat_chas = chas, cat_rm = rm)
ggplot(data, aes(x=cat_chas, y=medv)) +
geom_boxplot(fill="green", color="black") +
facet_wrap(~ cat_rm, ncol=3) +
labs(title="Boxplot of medv by cat_chas and cat_rm",
x="cat_chas",
y="Median Value ($1000s)") +
theme_minimal()
Boxplot output
Scatterplot
library(dplyr)
ggplot(BSTN, aes(x = age, y = medv, color = cat_rm)) +
geom_point() +
labs(title = "Scatter plot of MEDV", color = "Category RM", x = "Age", y = "MEDV") +
theme(plot.title = element_text(color = "blue", size = 17), plot.background = element_rect(fill = "orange"))
Scatterplot output
I thought the problem could be that I am using a data frame instead of a data set, so I tried switching it around, but I still get the same result.
user25331908 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.