Overall: I tried to build BERT model with Cloudbuild and Cloudrun. I saved the model (parameter) and metadata (labels) at the GCP Cloud Storage. But, I encountered the error from load metadata.bin file by joblib.load(). My metadata.bin file contained UTF-8 characters but joblib.load expected ASCII characters. And default protocol is 4 in my version, but the error messages indicates the protocol was 0.
Related Dependencies: python 3.8.0, joblib 1.1.1 (I’ve already tried to upgrade recent version), google-api-core==2.19.1, google-auth==2.32.0, google-cloud-core==2.4.1, google-cloud-storage==2.18.0
My Effort: I already tried two cases.
- Tried at local. In this case, downloading both model.bin and metadata.bin files at GCP Cloud Storage works.
- Tried at docker. In this case, loading metadata.bin file and model.bin file at dockerized container also works.
Error Details:
`
File "./src_review/model_server.py", line 70, in load_bert_model
metadata = joblib.load(metadata_path)
File "/usr/local/lib/python3.8/site-packages/joblib/numpy_pickle.py", line 658, in load
obj = _unpickle(fobj, filename, mmap_mode)
File "/usr/local/lib/python3.8/site-packages/joblib/numpy_pickle.py", line 577, in _unpickle
obj = unpickler.load()
File "/usr/local/lib/python3.8/pickle.py", line 1210, in load
dispatch[key[0]](self)
File "/usr/local/lib/python3.8/pickle.py", line 1244, in load_persid
raise UnpicklingError(
_pickle.UnpicklingError: persistent IDs in protocol 0 must be ASCII strings`
My Code:
`
def load_bert_model(config: argparse.Namespace):
bucket = storage_client.bucket(bucket_name)
model_blob = bucket.blob(model_file)
metadata_blob = bucket.blob(metadata_file)
local_model_path = '/tmp/pytorch_model.bin'
metadata_path = '/tmp/meta.bin'
print(f"Downloading model to {local_model_path}")
model_blob.download_to_filename(local_model_path)
log.info(f"Model downloaded to {local_model_path}")
metadata_blob.download_to_filename(metadata_path)
log.info(f"Metadata (label) downloaded to {metadata_path}")
metadata = joblib.load(metadata_path)
...`
From GCP official Documents
`
def upload_directory_with_transfer_manager(bucket_name, source_directory, workers=1):
bucket = create_bucket_if_not_exists(bucket_name)
directory_as_path_obj = Path(source_directory)
paths = directory_as_path_obj.rglob("*.bin")
file_paths = [path for path in paths if path.is_file()]
relative_paths = [path.relative_to(source_directory) for path in file_paths]
string_paths = [str(path) for path in relative_paths]
print("Found {} files.".format(len(string_paths)))
results = transfer_manager.upload_many_from_filenames(
bucket, string_paths, source_directory=source_directory, max_workers=workers, skip_if_exists=False
)
for name, result in zip(string_paths, results):
if isinstance(result, Exception):
print("Failed to upload {} due to exception: {}".format(name, result))
else:
print("Uploaded {} to {}.".format(name, bucket.name))
`
Expected Cause: I think cloud run config is quite different from my test environments. But I could not expected the main causes.
Thanks for your efforts!
은태김 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.