I have around 2000 text files. While I was running textstat_summary
I faced the following issue and unsure what to do next. I could somehow identify the problem came from this specific file (maybe there are more).
Error in validObject(.Object) :
invalid class “dfm” object: first element of 'p' slot is not 0
This is my code.
As this problem came from a specific file, I attached it here for your references: Link
Any suggestion to fix the error is appreciated.
library(quanteda)
library(quanteda.textstats)
library(tidyverse)
mlist <- list.files(pattern = "\.txt$", full.names = TRUE)
file_names <- character()
contents <- character()
for (file in mlist) {
content <- read_lines(file, skip = 7)
content <- paste(content, collapse = "n")
file_names <- c(file_names, basename(file))
contents <- c(contents, content)
}
cb_list <- data.frame(filename = file_names, content = contents, stringsAsFactors = FALSE)
cb_list <- cb_list |>
mutate(co_cik = str_extract(filename, "\d+_")) |>
mutate(filing_date = str_extract(filename, "_....-..-.._"))
cb_list$co_cik <- str_remove_all(cb_list$co_cik, "_")
cb_list$filing_date <- str_remove_all(cb_list$filing_date, "_")
crps <- corpus(cb_list, docid_field = "filename", text_field = "content")
text_stat_summary_cb_list <- textstat_summary(crps)