I am conducting some data analysis on the Medicaid Spending by Drug Data Dictionary dataset. Specifically, I want to perform a logistic regression, where y should be the CAGR_Avg_Spnd_Per_Dsg_Unt_18_22.
Unfortunately, the class and mode remain as characters based on my code.
My inspiration for the “Up” and “Down” approach comes from the following:
# The libary comes from Introduction to Statistical Learning: With Applications in R
library(ISLR)
attach(Smarket)
summary(Smarket)
# desired output:
glm.fit=glm(Direction~Lag1 + Lag2 + Lag3 + Lag4 + Lag5 + Volume,
family=binomial,data=Smarket)
contrasts(Direction)
By using glm.fit,
I can perform predictions, create a confusion matrix, and more.
However, when examining
summary(drug.spending)
My “Up” and “Down” are characters, while the authors to ISLR’s “Up” and “Down” appear to be numerically counted. The authors never offered a code for doing that with their data frame’s “Up” and “Down” observations!
Here is my code:
library(dplyr)
library(tidyr)
library(psych)
library(leaps)
set.seed(1)
spending <- read.csv("medicaid_spending_by_drug_data_dictionary.csv")
drug.spending <- spending %>%
na.omit(spending) %>%
filter(Mftr_Name == "Overall") %>%
arrange(desc(Tot_Mftr)) %>%
filter(duplicated(Gnrc_Name))
drug.spending <- drug.spending[!duplicated(drug.spending$Gnrc_Name),]
attach(drug.spending)
drug.spending <- drug.spending %>%
mutate(CAGR_Direction = ifelse(CAGR_Avg_Spnd_Per_Dsg_Unt_18_22 > 0, 'Up', 'Down'))
summary(drug.spending)
contrasts(CAGR_Direction) #gives an error
I have used different coercions, such as as.numeric()
and as.integer()
. I am not exactly sure where I am going wrong…
Please reach out for clarifications.
Brad Presson is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.