So I was trying to import the emnist database but i’m not able to call the extract_training_samples. It keeps saying that the file is not a zip file. Im following the documentation given: https://pypi.org/project/emnist/ but it’s still giving me an error. Anyone know how I can fix it? I don’t want to use keras or other libraries that have mnist because all of my code was written specifically to use the database like this and i would have to edit my whole notebook.
# STEP 1.1
!git clone https://github.com/sorki/python-mnist
!./python-mnist/bin/mnist_get_data.sh
%pip install emnist
from emnist import extract_training_samples
print("Imported the EMNIST libraries we need!")
# STEP 1.2
# Grab the data from the OpenML website
# X will be our images and y will be the labels
X, y = extract_training_samples('letters')
# Make sure that every pixel in all of the images is a value between 0 and 1
X = X / 255.
# Use the first 60000 instances as training and the next 10000 as testing
X_train, X_test = X[:60000], X[60000:70000]
y_train, y_test = y[:60000], y[60000:70000]
# There is one other thing we need to do, we need to
# record the number of samples in each dataset and the number of pixels in each image
X_train = X_train.reshape(60000,784)
X_test = X_test.reshape(10000,784)
print("Extracted our samples and divided our training and testing data sets")
Here is the error:
BadZipFile Traceback (most recent call last)
Cell In[2], line 5
1 # STEP 1.2
2
3 # Grab the data from the OpenML website
4 # X will be our images and y will be the labels
----> 5 X, y = extract_training_samples('letters')
7 # Make sure that every pixel in all of the images is a value between 0 and 1
8 X = X / 255.
File /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/emnist/__init__.py:209, in extract_training_samples(dataset)
206 def extract_training_samples(dataset):
207 """Extract the training samples for a given dataset as a pair of numpy arrays, (images, labels). The dataset must be
208 one of those listed by list_datasets(), e.g. 'digits' or 'mnist'."""
--> 209 return extract_samples(dataset, 'train')
File /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/emnist/__init__.py:199, in extract_samples(dataset, usage)
196 def extract_samples(dataset, usage):
197 """Extract the samples for a given dataset and usage as a pair of numpy arrays, (images, labels). The dataset must
198 be one of those listed by list_datasets(), e.g. 'digits' or 'mnist'. Usage should be either 'train' or 'test'."""
--> 199 images = extract_data(dataset, usage, 'images')
200 labels = extract_data(dataset, usage, 'labels')
201 if len(images) != len(labels):
...
-> 1369 raise BadZipFile("File is not a zip file")
1370 if self.debug > 1:
1371 print(endrec)
BadZipFile: File is not a zip file
i tried calling list_databases() to see if maybe it was the letters database that was acting up but that also presented with the same error that it was unable to read the file because it wasn’t a zip file.
Aanya Bhandari is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.