Adding special tokens to beginning and end of ngram function
Im writing a function to that takes in text and converts the text into ngrams based on the order, n. So for bigrams n=2, fivegrams n=5, and so on. Im trying to add special tokens at the beginning and end. I need to put n-1 special tokens in the beginning, and 1 special token at the end.
Making a python dictionary with a for loop for tokens and their model score
So im trying to make a python dictionary comprising of a word and its model score for all of the words in my file. My issue is that I can’t find a way to put the keyword for my iterator, words
, into the .score function without it literally giving me the score for the word “words”. The score function gives you a probability score based on the word that is input but I need it to cycle through each word in the file and give me the score for each.
what is the difference between MWE Tokenizer and countvectorizer+ngram?
looking through the documentation about ngrams and the different vectorizors, I came across the Multi-word expression tokenizer (MWETokenizer) which locates phrases in a text and converts them into a single token.
Function that returns tuples composed of a python diction
Im trying to create a function that takes a list of tokenized words for a review and a label and returns a list of tuples composed of a python dictionary and the label associated.
cannot import punkt nltk
Due to security settings at work i cannot simply do nltk.download(‘punkt’)