I am looking to create an EMR cluster via airflow DAG using EmrCreateJobFlowOperator using a role called dev-emr-ec2-profile-role for jobFlow. This role is used to provision EMR cluster via Terraform code and the role itself is created using Terraform code hosted in GitLab Repo. EMR clusters are successfully provisioned via Terraform. However, if I use the same role in a airflow dag, the below is the error.
botocore.exceptions.ClientError: An error occurred (ValidationException) when calling the RunJobFlow operation: Invalid InstanceProfile: dev-emr-ec2-profile-role.
Airflow code snippet:
create_job_flow = EmrCreateJobFlowOperator(
task_id="create_job_flow",
job_flow_overrides=JOB_FLOW_OVERRIDES,
)
JOB_FLOW_OVERRIDES: dict[str, Any] = {
“Name”: “PiCalc”,
“ReleaseLabel”: “emr-7.1.0”,
“Applications”: [{“Name”: “Spark”}],
“Instances”: {
“InstanceGroups”: [
{
“Name”: “Primary node”,
“Market”: “ON_DEMAND”,
“InstanceRole”: “MASTER”,
“InstanceType”: “m5.xlarge”,
“InstanceCount”: 1,
},
],
“KeepJobFlowAliveWhenNoSteps”: False,
“TerminationProtected”: False,
},
“Steps”: SPARK_STEPS,
“JobFlowRole”: “dev-emr-ec2-profile-role”,
“ServiceRole”: “dev-emr-service-role”,
}
I tried doing via CLI and boto3. The same error is occuring.
CLI:
aws emr create-cluster –name test-emr-cluster –service-role dev-emr-service-role –release-label emr-7.0.0 –instance-count 3 –instance-type m5.xlarge –applications Name=Spark Name=Hadoop –ec2-attributes InstanceProfile=dev-emr-ec2-profile-role –log-uri <bucket_Name>
Error:
Same error as above
1