Calculation of document word vector in python. Sum or average word2vec?
I have some questions about generating a dissimilarity matrix of a bunch of text documents using word vectors. Here I tokenise the text, remove OOV and then sum the word vectors of each word to use as the document vectors. Then I compute the cosine distance. Is this approach correct? Some people say the vectors must be averaged, others summed? Which is correct?