I have a Python program that submits pipelines to Azure ML. This code typically runs on headless Linux VMs, and authenticates with Azure using the DeviceCodeCredentials
flow. I want to cache these credentials so that I can run this script many times and only have to re-authenticate every so often (e.g. every hour).
According to the documentation, TokenCachePersistenceOptions
should solve this, however it seems to have no effect beyond writing a useless file. This is what my code looks like:
from azure.ai.ml import MLClient
from azure.identity import DeviceCodeCredential, TokenCachePersistenceOptions
from .constants import AML_SUBSCRIPTION_ID, AML_RESOURCE_GROUP, AML_WORKSPACE
cache_path = os.path.expanduser("~/.azure/msal_token_cache.json")
token_cache_options = TokenCachePersistenceOptions(name=cache_path, allow_unencrypted_storage=True)
credential = DeviceCodeCredential(token_cache_persistence_options=token_cache_options)
client = MLClient(
credential=credential,
subscription_id=AML_SUBSCRIPTION_ID,
resource_group_name=AML_RESOURCE_GROUP,
workspace_name=AML_WORKSPACE,
)
# do something useless that requires making requests
jobs = client.jobs.list()
_ = [j.name for j in jobs]
I would expect that if I run this script, authenticate, and then run it a second time, it would not request authentication again. However, it does.
I have tried different credential scopes, as well creating and manually serializing and deserializing all manner of “tokens” and “authorization records”, and passing those tokens and authorization records into methods that may or may not actually do anything at all with them, but nothing works. I have scoured the documentation for the Python SDK. I contacted Azure, who were useless.
Does anyone have suggestions?
The solution was two parts:
- Use
AuthenticationRecord
- make sure when you create the
AuthenticationRecord
it has the right scope
Working (minimal) example:
import os
from azure.ai.ml import MLClient
from azure.identity import DeviceCodeCredential, TokenCachePersistenceOptions, AuthenticationRecord
from .constants import AML_SUBSCRIPTION_ID, AML_RESOURCE_GROUP, AML_WORKSPACE
cache_path = os.path.expanduser("~/.azure/msal_token_cache.json")
auth_record_path = os.path.expanduser("~/.azure/auth_record.json")
token_cache_options = TokenCachePersistenceOptions(name=cache_path, allow_unencrypted_storage=True)
if os.path.exists(auth_record_path):
print(f"Using cache: {auth_record_path}")
with open(auth_record_path, 'r') as auth_in:
record_json_str = auth_in.read()
deserialized_record = AuthenticationRecord.deserialize(record_json_str)
credential = DeviceCodeCredential(cache_persistence_options=token_cache_options, authentication_record=deserialized_record)
else:
print(f"Cache not found, authenticating")
credential = DeviceCodeCredential(cache_persistence_options=token_cache_options)
record = credential.authenticate(scopes=["https://ml.azure.com/.default"])
record_json = record.serialize()
with open(auth_record_path, 'w') as auth_out:
auth_out.write(record_json)
client = MLClient(
credential=credential,
subscription_id=AML_SUBSCRIPTION_ID,
resource_group_name=AML_RESOURCE_GROUP,
workspace_name=AML_WORKSPACE,
)
# do something useless that requires making requests
jobs = client.jobs.list()
_ = [j.name for j in jobs]
print("Successfully ran the script")
More info can be found here