I am trying to filter out the species (ASV) in samples that have lower abundance counts than the same species in controls. Below is a simplified version of my tidy data table:
ASV | ID | Type | Batch | Counts |
---|---|---|---|---|
ASV1 | Control23 | Con1 | B1 | 5000 |
ASV2 | Control23 | Con1 | B1 | 2000 |
ASV1 | Sample21 | S | B1 | 4000 |
ASV2 | Sample21 | S | B1 | 6000 |
ASV1 | Sample28 | S | B1 | 10000 |
ASV2 | Control25 | Con25 | B2 | 25000 |
ASV3 | Control25 | Con25 | B2 | 9000 |
ASV2 | Sample12 | S | B2 | 70000 |
ASV3 | Sample12 | S | B2 | 26000 |
ASV2 | Sample10 | S | B2 | 3100 |
For example, I would want to remove ASV1 in Sample21 from the dataset because its counts are less than the Control23 that was used for Batch B1. Another example: I want to keep ASV2 in Sample12 because its counts are higher than the counts of the same species in Control25 in Batch B2.
I’ve been using tidyverse
to clean and group my data but I’m stuck as to how to subset the ASV count value for the control in each batch. I thought it would be simple to use filter
as follows:
df_ASV.Tidy.Batch <- df_ASV.Tidy %>%
group_by(Batch, Type, ID, ASV) %>%
filter((Type != "S" & Counts > 0) > (Type != "C" & Counts > 0))
But the output only acknowledged the argument for Type != "S" & Counts > 0
and not Type != "C"
& Counts > 0, and the output was the same even without Type != "C" & Counts > 0
.
I think I would likely need to create a new column by mutate
or summarise
to return TRUE or FALSE and then use filter
, but it returns back to the original problem where I can’t figure out how to specifically compare the count values of control vs sample of the same species and batch.
Any advice would be greatly appreciated. Thank you!