Okay so I am trying to plot a scatterplot of test scores in various states over time. In the dataframe there are multiple columns that are not being used, and two column types that are being used. One column is state, and the data are 2 letter state abbreviations. Second column type is the scores; these columns are titled with the year. unnecessary columns are between necessary ones and the data goes all the way to 2023. additionally, there are two entries for each state, one english one math, I would only like to use the english scores. ex.
State name subject 2016 2016sd 2016_eb 2016_eb_sd 2017 2017sd
AL Alabama Math 46.867684 30.09090 132.3482 50.976749 46.4759 20.9475937
AL Alabama ela 46.867684 30.09090 132.3482 50.976749 46.4759 20.9475937
I am trying to create a scatterplot that treats the state name as series, the x axis being the year of the score (so basically the title of the column it came from) and the y axis actually being the value of the score. I would like to graph lines between each data point on the graph and only graph the data for 5 states but all the years available. The graph should look something like this.
graph example
okay so i have been trying a ton of stuff, first i thought to use melt to make the data “long” but i think that would lowkey break my data. I also am currently trying to use ggplot2 to create the graph kinda like this:
print( ggplot(School_data, aes(x = state.abb, y = ys_mn_2016_ol)) +
geom_jitter(width = 0.1) +
labs(title = "Average Test Scores in M",
x = "Score",
y = "Year")
the x and y are just placeholders because i know the set up is wrong. I was trying to also potentially do something like this:
ggplot()+
geom_jitter(data=School_data, aes(x=2016,2017(SAME THING???), y=2016,2017(etc. IDK HOW TO DO THIS PART), color=state.abb),
alpha=0.5, width=0.2, height=0.0)+
stat_smooth(data=School_data, aes(x=2016,2017(SAME THING???), y=2016 etc.,color=state.abb),
method="lm", se=F)+
scale_color_manual(values=c("orange","mediumorchid", "turquoise"))+
theme_bw()+
xlab("Score")+
ylab("Year")
This also did not work, I was trying to add in the lines.
I know I may need to reorganize the data but I felt that it would be possible to call out the specific components. This isnt for an assignment or anything I’m just trying to get to know R and would love data organization or graphing suggestions.
Estree Arenal is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.