I’m trying to import a .txt using read_tsv
but when filtering columns I end up with a dataframe of 0 obs. Using base r read.table
, the subsequent dataframe generates the intended outcome minus rows that match criteria in filter()
. I change a column type to character after the read_tsv
generated a warning however this did not change the outcome of 0 obs. What can I do with read_tsv
to ensure that filter()
will produce the intended dataframe that removes a small number of rows that fulfill the criteria?
data input here https://file.io/QUOyiQEktNMS
spombe_protein_6plex_mq_output <- read_tsv("Galaxy116_MaxQuant_Protein_Groups_SP.txt") #attempt to read in using tidyverse
Rows: 4310 Columns: 84
── Column specification ────────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: "t"
chr (21): Protein IDs, Majority protein IDs, Peptide counts (all), Peptide counts (razor+unique), Peptide counts (unique), P...
dbl (62): Number of proteins, Peptides, Razor + unique peptides, Unique peptides, Peptides E1, Razor + unique peptides E1, U...
lgl (1): Reverse
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Warning message:
One or more parsing issues, call `problems()` on your data frame for details, e.g.:
dat <- vroom(...)
problems(dat)
spombe_protein_6plex_mq_output2 <- spombe_protein_6plex_mq_output %>%
mutate(across(Reverse, as.character))
spombe_test <- read.table("Galaxy116_MaxQuant_Protein_Groups_SP.txt", header = TRUE, sep = "t", stringsAsFactors = FALSE) #base utils
spombe_protein_6plex_mq_output_filtered <- spombe_protein_6plex_mq_output2 %>%
dplyr::filter(Reverse != "+") %>%
dplyr::filter(`Only identified by site` != "+") %>%
dplyr::filter(`Potential contaminant` != "+")
#returns 0 obs of 84 variables
spombe_test2 <- spombe_test %>%
filter(Reverse != "+") %>%
filter(`Only.identified.by.site` != "+") %>%
filter(Potential.contaminant != "+")
#returns 4212 obs of 84 variables