Here is my scenario:
There are multiple disparate, separate Cloud-based (Azure and AWS) resources. Some are fully deployed by IAC, some are deployed via Click-Ops. For various reasons (dont ask), Me and my Team don’t have direct control over the actual deployment of resources.
We are responsible for ensuring that the resources are monitored. We are using Datadog as our monitoring tool. We currently have a Terraform script that we deploy per separate cloud instance (e.g. one DD instance for the Finance team and their cloud infra, one for the Sales team and their cloud infra) – before anyone asks – consolidation is not an option (been there, had that fight – too many egos at the senior level)
The way we currently do it, is we have pre-defined templates for each metric per resource type (e.g. for VMs we will monitor CPU, RAM, Disk space etc.) – and create a monitor per resource type. This works well until we need to adjust thresholds on a specific resource (e.g. Server 1 has workloads that are very peaky – e.g. 100% resource usage for short periods of time) – since there is just a monitor per resource type, we either adjust the threshold globally, or we have to create a click-ops monitor for that specific resource, excluding it from the global monitor and remember to recreate it when we have to do a terraform apply in future.
We have a CSV export of the resources from each cloud environment.
What I would like to be able to do is the following:
- Have a Base Config
- Have store a list of exception config options
- Pass to terraform said list with all the resources and the relevant config options (e.g. if no custom config, use base)
- Have terraform either loop-through in a for-each style operation or programagically generate the terraform code based on that
- Apply the terraform
- Sleep easy at night knowing all the resources are monitored on a per-resource basis and all the config options are stored in a single source of truth.
We are probably going to make a little internal Web GUI tool to do most of the above – the bit I have a question on:
What is the best way to handle the iteration part – is it to generate the main.tf file on the fly and then apply that or is it to pass Terraform a CSV or JSON or other form of structured data and do the for-each iteration inside of Terraform, calling a module? Is there an even better way of doing this