I have a dataset in R where I have volatility estimates (in my case, just standard deviation of minute returns on that day) for different days:
Date, Volatility, DayType
The variable DayType is 0 or 1 (no other options – no NA either), and it denotes whether a date meets some specific criteria or not – let’s say that a day can either be normal or special. I want to run a simple test to see whether volatility on special day is different from a volatility on a normal day.
Since DayType can only have two values, the simple regression analysis should be equivalent to a t-test. But I get two different p-values, and I am wondering what I’ve misunderstood.
First, I can run a simple regression:
MyModel1=lm(Volatility ~ DayType, data=MyData)
summary(MyModel1)
This will give me a p-value for DayType.
Alternatively, I can split the dataset into two datasets and run a t-test.
library(dplyr)
MyDataSpecial=MyData %>% filter(DayType==1)
MyDataNormal=MyData %>% filter(DayType==0)
t.test(MyDataSpecial$Volatility, MyDataNormal$Volatility, alternative="two.sided")
This will give a different p-value. Can you help me understand what is wrong here?
Here is also another example with mtcars dataset:
library(dplyr)
MyData=mtcars
#Regression
MyModel1=lm(mpg ~ vs, data=MyData)
summary(MyModel1)
#t-test
MyData1=MyData %>% filter(vs==1)
MyData2=MyData %>% filter(vs==0)
t.test(MyData1$mpg, MyData2$mpg, alternative="two.sided")
Thank you!