I have survey data, with recordings every 1 NM along parallel transects perpendicular to the coast. For each record I have information such as latitude and longitude, speed, bearing. Along the transects the speed is around 10 knots. I also have points along the inter-transects (where speed might be different, and bearing definitely is) and I have points outside the transects if a trawl was carried out.
What I would like to do is grouping all the points belonging to the same transects (e.g. see figure):
[
As you can see in the figure, this does not really work because whenever there is an inter-transect (like transect nr 5), this will be grouped together with the transect itself. Also, in transect 13, since the distance for some reason is slightly larger then 1 NM between 2 subsequent records, the points are separated into 2 transects (you can see the slightly different color).
Dataframe example shown here:
|year |datetime |xkm |ykm |logdiff |time_diff |distance |bearing |speed |
|<dbl> |<dttm> |<dbl> |<dbl>| <dbl>| <drtn> | <dbl>| <dbl>| <dbl>|
|------|---------- |------|-----|--------|------ --|---------|--------|------|
|2023 |2023-09-26 15:03:00 |221. |1606.| 1| 360 secs | 1| -1.58| 10|
|2023 |2023-09-26 15:08:00 |223. |1606.| 1| 300 secs | 1| -1.58| 12|
|2023 |2023-09-26 15:14:00 |225. |1606.| 1| 360 secs | 1| -1.58| 10|
|2023 |2023-09-26 15:19:00 |227. |1606.| 1| 300 secs | 1| -1.58| 12|
|2023 |2023-09-26 15:25:00 |229. |1606.| 1| 360 secs | 1| -1.84| 10|
|2023 |2023-09-26 15:30:00 |231. |1606.| 1| 300 secs | 1| -1.85| 12|
|2023 |2023-09-26 15:36:00 |233. |1606.| 1| 360 secs | 1| -1.58| 10|
|2023 |2023-09-26 15:41:00 |234. |1605.| 1| 300 secs | 1| -1.85| 12|
|2023 |2023-09-26 15:47:00 |236. |1605.| 1| 360 secs | 1| -1.58| 10|
|2023 |2023-09-26 15:52:00 |238. |1605.| 1| 300 secs | 1| -1.58| 12|
What could be the best approach to tackle that in R? I thought about some unsupervised machine learning algorithm, like clustering using dbscan, but I’m not sure I’m using it properly. Beside distance between points, I would like to use other parameters to classify whether a point belongs to a transect or not (such as bearing and speed for example).
My attempt:
# Prepare data for clustering
clustering_data <- df %>% select(year, speed, bearing, xkm, ykm)
# Apply DBSCAN clustering
set.seed(123)
db <- dbscan(clustering_data, eps = 1.8, minPts = 5)
Any suggestion?
Thanks,
Val