Relative Content

Tag Archive for pythonnltk

Adding special tokens to beginning and end of ngram function

Im writing a function to that takes in text and converts the text into ngrams based on the order, n. So for bigrams n=2, fivegrams n=5, and so on. Im trying to add special tokens at the beginning and end. I need to put n-1 special tokens in the beginning, and 1 special token at the end.

Making a python dictionary with a for loop for tokens and their model score

So im trying to make a python dictionary comprising of a word and its model score for all of the words in my file. My issue is that I can’t find a way to put the keyword for my iterator, words, into the .score function without it literally giving me the score for the word “words”. The score function gives you a probability score based on the word that is input but I need it to cycle through each word in the file and give me the score for each.

what is the difference between MWE Tokenizer and countvectorizer+ngram?

looking through the documentation about ngrams and the different vectorizors, I came across the Multi-word expression tokenizer (MWETokenizer) which locates phrases in a text and converts them into a single token.

Function that returns tuples composed of a python diction

Im trying to create a function that takes a list of tokenized words for a review and a label and returns a list of tuples composed of a python dictionary and the label associated.

cannot import punkt nltk

Due to security settings at work i cannot simply do nltk.download(‘punkt’)

Thiết kế website giá rẻ

Danh mục