Im trying to write a function called estimate_ngram_probabilities
to estimate the conditional probabilities of N-grams, utilizing the ConditionalFreqDist
and ConditionalProbDist
classes along with a probability distribution such as MLEProbDist
from the nltk.probability
module. The function should take the following parameter:
ngrams_list
: A list of N-grams for which to estimate conditional probabilities.
The function should return:
- A
ConditionalProbDist
object representing the conditional probabilities of the input N-grams.
Using the function, I’m then trying to create two variables named bigram_prob_dist
and fivegram_prob_dist
. The bigram_prob_dist
variable is created using a list of bigrams, and the fivegram_prob_dist
is created using a list of fivegrams.
The problem that Im having is that I’m getting an error “ValueError: too many values to unpack (expected 2)” when trying to run the fivegram list through the function. The bigram list runs fine.
Here is my function so far:
def estimate_ngram_probabilities(ngrams_list):
cfdist = ConditionalFreqDist(ngrams_list)
cpdist = ConditionalProbDist(cfdist, MLEProbDist)
return cpdist
The fivegram list that im trying to pass to the function is in this format:
[('<s>', '<s>', '<s>', '<s>', '1609'),
('<s>', '<s>', '<s>', '1609', 'sonnets'),
('<s>', '<s>', '1609', 'sonnets', 'william'),
('<s>', '1609', 'sonnets', 'william', 'shakespeare'),
('1609', 'sonnets', 'william', 'shakespeare', '1'),
('sonnets', 'william', 'shakespeare', '1', 'fairest'),
('william', 'shakespeare', '1', 'fairest', 'creatures'), ...
Kevin Veeder is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.