I have time series data for two crops, Guar and Bajra. This data is from Sentinel-2 channels. For example, let’s take only one channel – red
example my data
guar=structure(list(red = c(2659L, 2904L, 2793L, 2710L, 2630L, 2710L,
2411L, 1931L, 2616L, 2413L, 2480L, 2707L, 2478L, 2100L, 2194L,
2678L, 2653L, 2510L, 2397L, 2283L), date = structure(c(19144,
19144, 19144, 19144, 19144, 19144, 19144, 19144, 19144, 19144,
19144, 19144, 19144, 19144, 19144, 19144, 19144, 19144, 19144,
19144), class = "Date"), lat = c(27.17952, 27.177259, 27.16638,
27.14773, 27.20751, 27.129601, 27.129499, 27.06547, 27.059759,
27.056311, 27.05464, 27.066231, 27.17852, 27.20619, 27.206369,
27.21134, 27.178249, 27.17832, 27.17885, 27.317499), lon = c(74.746384,
74.675018, 74.891144, 74.80632, 74.918488, 74.727509, 74.727623,
74.753929, 74.753967, 74.765533, 74.763611, 74.749931, 74.668198,
74.707581, 74.708, 74.690781, 74.689583, 74.690369, 74.69648,
74.997597), gfid = 196001:196020), row.names = c(1L, 28L, 52L,
79L, 106L, 133L, 158L, 183L, 210L, 236L, 262L, 288L, 312L, 338L,
365L, 390L, 416L, 442L, 470L, 495L), class = "data.frame")
and
bajra=structure(list(red = c(2794L, 2616L, 2312L, 2822L, 2193L, 2531L,
2753L, 2838L, 2872L, 2951L, 2809L, 3034L, 2816L, 2405L, 2640L,
2624L, 2608L, 2408L, 2481L, 2281L), date = structure(c(19144,
19144, 19144, 19144, 19144, 19144, 19144, 19144, 19144, 19144,
19144, 19144, 19144, 19144, 19144, 19144, 19144, 19144, 19144,
19144), class = "Date"), lat = c(26.89764, 27.445471, 27.44931,
27.442499, 27.4447, 27.44932, 27.291901, 27.292749, 27.28216,
27.25679, 27.24082, 27.26885, 27.26919, 27.284309, 27.29034,
27.42202, 27.425949, 27.4286, 27.417879, 27.15659), lon = c(74.0746,
74.520081, 74.512459, 74.51445, 74.43338, 74.409729, 74.381493,
74.384361, 74.38755, 74.400467, 74.380951, 74.815201, 74.81868,
74.821083, 74.82, 74.833382, 74.841629, 74.850731, 74.817703,
74.519058), gfid = c(196297L,
196298L, 196299L, 196300L, 196301L,
196302L, 196303L, 196304L, 196305L, 196306L, 196307L, 196308L,
196309L, 196310L, 196311L, 196312L, 196313L, 196314L, 196315L,
196317L)), row.names = c(1L, 27L, 53L, 80L, 106L, 134L, 161L,
188L, 217L, 245L, 271L, 297L, 325L, 354L, 383L, 411L, 439L, 466L,
491L, 544L), class = "data.frame")
The very first question that worries me is whether my comparison methodology is correct?
I do this.
First I convert the red variable for each crop to ts
format
# Loading a library for working with dates
library(lubridate)
# Convert date variable to date format
guar$date <- dmy(guar$date)
# Ranking the date variable in ascending order
guar <- guar[order(guar$date), ]
# Convert the red variable to ts format
guar$red <- ts(guar$red)
# Display information about the guar dataset
str(guar)
# Convert date variable to date format
bajra$date <- dmy(bajra$date)
# Ranking the date variable in ascending order
bajra <- bajra[order(bajra$date), ]
# Convert the red variable to ts format
bajra$red <- ts(bajra$red)
# Display information about the bajra dataset
str(bajra)
# Loading a library for working with time series
library(forecast)
# Convert time series to time series (ts) format
bajra_ts <- ts(bajra$red)
guar_ts <- ts(guar$red)
Find the overlapping period of the time series (I have data of different sizes, there are more rows in guar)
common_period <- intersect(time(bajra_ts), time(guar_ts))
# Select only the overlapping period for each time series
bajra_ts_common <- window(bajra_ts, start = min(common_period), end = max(common_period))
guar_ts_common <- window(guar_ts, start = min(common_period), end = max(common_period))
# Calculate the correlation and p-value between time series
cor_test_result <- cor.test(bajra_ts_common, guar_ts_common)
# Output correlation and p-value
print(paste("Correlation between the two time series in the common period:", cor_test_result$estimate))
print(paste("P-value:", cor_test_result$p.value))
[1] "Correlation between the two time series in the common period: 0.421489764989799"
> print(paste("P-value:", cor_test_result$p.value))
[1] "P-value: 0"
We see a statistically significant correlation, is it correct to conclude that this series is similar between these cultures?
And most importantly, is the approach correct?
If not, please tell me how best to compare 2 time series, taking into account the fact that the data has a different number of rows
Additionally, I tried to compare the series using Wilcoxon t-Wilcoxon
> wilcox_test_result <- wilcox.test(bajra_ts_common, guar_ts_common, paired = TRUE)
> print("Wilcoxon Signed Rank Test:")
[1] "Wilcoxon Signed Rank Test:"
> print(paste("W statistic:", wilcox_test_result$statistic))
[1] "W statistic: 21385804"
> print(paste("p-value:", wilcox_test_result$p.value))
[1] "p-value: 2.04795007987592e-160"
>
Well, this is a very strange result, if the stat correlation is significant, then the Wilcoxon test should not be statistically significant.
In other words, I need a hint on how to compare 2 time series, it seems to me that I am doing something wrong. I am not so strong in methodology.
Thanks in advance for any help.