I’m trying to configure HashiCorp Vault in Oracle cloud. The setup includes 2 compute instances and network load balancer for traffic distribution. I’m using Ansible role (https://github.com/ansible-community/ansible-vault) for configuration management of the instances. Unsealing should be performed by key stored within Oracle KMS.
Setup
Network load balancer (private IP address: 10.20.68.87)
Instance 1 (private IP address: 10.20.67.46) vault_main.hcl
config file:
# Ansible managed
cluster_name = "dc1"
max_lease_ttl = "768h"
default_lease_ttl = "768h"
disable_clustering = "False"
cluster_addr = "http://10.20.67.46:8201"
api_addr = "http://10.20.67.46:8200"
plugin_directory = "/usr/local/lib/vault/plugins"
listener "tcp" {
address = "0.0.0.0:8200"
cluster_address = "0.0.0.0:8201"
tls_disable = "true"
}
storage "raft" {
path = "/var/vault"
node_id = "inst-yddqp-instancepool20240612050658"
retry_join {
leader_api_addr = "http://10.20.68.87:8200"
}
}
// HashiCorp recommends disabling mlock when using Raft.
disable_mlock = true
ui = true
seal "ocikms" {
key_id = "<key-id-here>"
auth_type_api_key = "False"
crypto_endpoint = "https://<endpoint-value-here>"
management_endpoint = "https://<endpoint-value-here>"
}
Instance 2 (private IP address: 10.20.6.15) vault_main.hcl
config file:
# Ansible managed
cluster_name = "dc1"
max_lease_ttl = "768h"
default_lease_ttl = "768h"
disable_clustering = "False"
cluster_addr = "http://10.20.6.15:8201"
api_addr = "http://10.20.6.15:8200"
plugin_directory = "/usr/local/lib/vault/plugins"
listener "tcp" {
address = "0.0.0.0:8200"
cluster_address = "0.0.0.0:8201"
tls_disable = "true"
}
storage "raft" {
path = "/var/vault"
node_id = "inst-jncko-instancepool20240612050658"
retry_join {
leader_api_addr = "http://10.20.68.87:8200"
}
}
// HashiCorp recommends disabling mlock when using Raft.
disable_mlock = true
ui = true
seal "ocikms" {
key_id = "<key-id-here>"
auth_type_api_key = "False"
crypto_endpoint = "https://<endpoint-value-here>"
management_endpoint = "https://<endpoint-value-here>"
}
I’ve run through HashiCorp documentation and it seems that I have all pieces in place. However, it fails on Vault API reachable?
step with this error:
fatal: [inst-yddqp-instancepool20240612050658]: FAILED! => {"attempts": 6, "changed": false, "elapsed": 0, "msg": "Status code was -1 and not [200, 429, 472, 473, 501, 503]: Request failed: <urlopen error [Errno 111] Connection refused>", "redirected": false, "status": -1, "url": "http://10.20.67.46:8200/v1/sys/health"}
fatal: [inst-jncko-instancepool20240612050658]: FAILED! => {"attempts": 6, "changed": false, "elapsed": 0, "msg": "Status code was -1 and not [200, 429, 472, 473, 501, 503]: Request failed: <urlopen error [Errno 111] Connection refused>", "redirected": false, "status": -1, "url": "http://10.20.6.15:8200/v1/sys/health"}
Remarks
- I’d expect it trying to reach load balancer IP address in this step directly, not IP address of each of the instances
- running
curl -v http://10.20.68.87:8200/v1/sys/health
on any of the instances returns:* Trying 10.20.68.87:8200... * Connected to 10.20.68.87 (10.20.68.87) port 8200 (#0) > GET /v1/sys/health HTTP/1.1 > Host: 10.20.68.87:8200 > User-Agent: curl/7.81.0 > Accept: */* > * Mark bundle as not supporting multiuse * HTTP 1.0, assume close after body < HTTP/1.0 400 Bad Request < Client sent an HTTP request to an HTTPS server. * Closing connection 0
telnet 10.20.68.87 8200
for any of the instances connects properly to the load balancer- access to OCI KMS key is granted via instance_principal, through dynamic group with identity policy specified as
allow dynamic-group ${oci_identity_dynamic_group.servers.name} to use keys in compartment id ${var.compartment_ocid}
- OCI KMS key can be accessed by both of the instances
- OCI load balancer routing policy condition is specified as
any(http.request.url.path ew (i '/v1/sys/health'))
At this point, I have no ideas what’s misconfigured. During my research, I stumbled upon High Available Hashicorp Vault Cluster Installation on VMWare question. I’m more or less mimicking this setup.
Any ideas what should I run/check/prove to troubleshoot this issue much appreciated.