I have some terraform where the Service Principal is Owner
of the Subscription. And it can create a compute instance on AML. I assigne a user and the user can connect to it.
But when I create myself, on the UI, a compute instance with the exact same settings from my terraform configuration, I fail to connect to it: User XXX does not have access to compute instance YYY
Here is the configuration:
I have no acces to terminal/Jupyter/Vscode.
I have no idea why it does not work. When an other user create a compute instance and assign it to me, it also does not work. But with the service principal which is only Owner
of the subscription, then the assigment works.
Here are my RBAC roles (on ressource group, not on subscription nor on Azure Machine Learning workspace Ressource):
AZURE-Datascience-AML-dev-Contributor
AZURE-ReadOnly-dev-RG
AzureML Compute Operator
AzureML Data Scientist
AzureML Registry User
Cognitive Services Contributor
Cognitive Services Usages Reader
Cognitive Services User
Contributor
Data Factory Contributor
DocumentDB Account Contributor
Key Vault Contributor
Key Vault Secrets Officer
Network Contributor
Owner
Reader
Storage Account Key Operator Service Role
Storage Blob Data Contributor
Storage Queue Data Contributor
Support Request Contributor
The custom role of AZURE-Datascience-AML-dev-Contributor
contains this:
{
"id": "/subscriptions/xxxxxxxxxxxx/providers/Microsoft.Authorization/roleDefinitions/xxxxxxxxx",
"properties": {
"roleName": "AZURE-Datascience-AML-dev-Contributor",
"description": "This role is used for AML",
"assignableScopes": [
"/subscriptions/xxxxxx/resourceGroups/rg-xxxxxxx-01"
],
"permissions": [
{
"actions": [
"Microsoft.MachineLearningServices/workspaces/*/read",
"Microsoft.MachineLearningServices/workspaces/*/action",
"Microsoft.MachineLearningServices/workspaces/*/delete",
"Microsoft.MachineLearningServices/workspaces/*/write",
"Microsoft.Network/virtualNetworks/*/read",
"Microsoft.Network/virtualNetworks/subnets/join/action"
],
"notActions": [],
"dataActions": [],
"notDataActions": []
}
]
}
}
In comparaison, here the terraform code that works (keep in mind that an SP is deploying it, not my user, and its only role is Owner
)
# Create a compute instance for each user
resource "azurerm_machine_learning_compute_instance" "aml_compute_instance" {
name = "${var.user.mail_nickname}-${var.context.environment}-A8M-V2"
machine_learning_workspace_id = var.machine_learning_workspace_id
virtual_machine_size = "STANDARD_A8M_V2"
identity {
type = "UserAssigned"
identity_ids = [
azurerm_user_assigned_identity.aml_user_assigned_identity.id
]
}
assign_to_user {
object_id = var.user.object_id
tenant_id = nonsensitive(var.secrets.TENANT_ID)
}
node_public_ip_enabled = false
subnet_resource_id = var.machine_learning_subnet_id
description = "Compute instance generated by Terraform for : ${var.user.mail_nickname}"
tags = var.tags
depends_on = [
module.keyvault_policy_aml_user_assigned_identity,
module.roles_aml_user_assigned_identity
]
}
I use the same subnet for the deployment of the Workspace and the compute instances (and also clusters). I only user 20-30 ips in my /24 subnet for now.