Trying this project: text
This code works fine
nlpl_zip="C:/180.zip" with zipfile.ZipFile(nlpl_zip, "r") as archive: stream = archive.open("model.bin") model = gensim.models.KeyedVectors.load_word2vec_format(stream, binary=True,unicode_errors='replace')
But when I tried to load model from http://vectors.nlpl.eu/repository/20/212.zip to folder C:/212.zip it doesn’t work out, cause there is no model.bin inside. Only these ones
enter image description here
But when I try stream = archive.open("model.ckpt.data-00000-of-00001")
I’ve got the following. What am I doing wrong?
UnicodeDecodeError Traceback (most recent call last)
Cell In[11], line 9
7 with zipfile.ZipFile(model_file, ‘r’) as archive:
8 stream = archive.open(‘model.ckpt.data-00000-of-00001′)
9 model = gensim.models.KeyedVectors.load_word2vec_format(stream, binary=True,unicode_errors=’replace’)File C:ProgramDataanaconda3libsite-packagesgensimmodelskeyedvectors.py:1719, in KeyedVectors.load_word2vec_format(cls, fname, fvocab, binary, encoding, unicode_errors, limit, datatype, no_header)
1672 @classmethod
1673 def load_word2vec_format(
1674 cls, fname, fvocab=None, binary=False, encoding=’utf8′, unicode_errors=’strict’,
1675 limit=None, datatype=REAL, no_header=False,
1676 ):
1677 “””Load KeyedVectors from a file produced by the original C word2vec-tool format.
1678
1679 Warnings
(…)
1717
1718 “””
1719 return _load_word2vec_format(
1720 cls, fname, fvocab=fvocab, binary=binary, encoding=encoding, unicode_errors=unicode_errors,
1721 limit=limit, datatype=datatype, no_header=no_header,
1722 )File C:ProgramDataanaconda3libsite-packagesgensimmodelskeyedvectors.py:2058, in _load_word2vec_format(cls, fname, fvocab, binary, encoding, unicode_errors, limit, datatype, no_header, binary_chunk_size)
2056 fin = utils.open(fname, ‘rb’)
2057 else:
2058 header = utils.to_unicode(fin.readline(), encoding=encoding)
2059 vocab_size, vector_size = [int(x) for x in header.split()] # throws for invalid file format
2060 if limit:File C:ProgramDataanaconda3libsite-packagesgensimutils.py:365, in any2unicode(text, encoding, errors)
363 if isinstance(text, str):
364 return text
365 return str(text, encoding, errors=errors)UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0xef in position 1: invalid continuation byte
tried many ways but failed
user26623260 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.