I want to use tokens_compound to examine the frequency of phrases in the documents of a corpus. I used the corpus data_corpus_inaugural for illustrative purposes and selected some ngrams to search for. I want to be able to save the output into a csv file. The code below, produces an output file, but no phrase is identified. Advice regarding how to correctly identify the frequency of phrases via a dictionary are appreciated.
\
library("quanteda")
## Package version: 2.1.2
data(data_corpus_inaugural)
toks <- data_corpus_inaugural %>%
tokens(remove_punct = TRUE,
remove_symbol = TRUE,
padding = TRUE) %>%
tokens_tolower()
tokens <- dfm(toks)
multiword <- c("the house of representatives", "the senate", "foreign legislative","fellow citiznes", "men of reflection", "total independence", "unlimited sumission","no middle course",
"apprehension of danger", "formidable power")
comp_toks <- tokens_compound(toks, pattern = phrase(multiword))
dictx <- dictionary(list(govt = c("the_house_of_representatives", "the_senate", "foreign_legislative"),
people = c("fellow_citizens", "men_of_reflection"),
action =c("total_independence", "unlimited_sumission"),
course ="no_middle_course",
energy = c("apprehension_of_danger", "formidable_power")))
test <- dfm_lookup(tokens, dictionary = dictx)
test2 <- convert(test , to = "data.frame")
write.csv (test2, "D:/Test.csv")
\