I’m facing an issue with Prometheus while trying to scrape metrics from my .NET Core microservice. I’ve got my service running in Docker, and I can access the /health_metrics
endpoint just fine through my browser at http://192.168.161.74:2011/health_metrics
. It displays metrics without any issue.
However, Prometheus is returning a 503 Service Unavailable
error when attempting to scrape this endpoint. Here’s a snippet of my prometheus.yml
configuration:
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'health-metrics'
metrics_path: /health_metrics
static_configs:
- targets: ['192.168.161.74:2011']
Additional Information:
- My microservice setup includes various tools and dashboards like:
- GraphQL Client Tools: Portal Banana, Portal Voyager, Adaptor Banana, Adaptor Voyager
- REST Client Tools: Swagger
- Hangfire Dashboard
- Health Check: Health UI, Health API
- The
/health_metrics
endpoint is accessible via browser and shows metrics such as GC collection count, process start time, memory usage, etc.
Here is a sample of the metrics exposed by /health_metrics
:
# HELP dotnet_collection_count_total GC collection count
# TYPE dotnet_collection_count_total counter
dotnet_collection_count_total{generation="1"} 0
dotnet_collection_count_total{generation="0"} 1
dotnet_collection_count_total{generation="2"} 0
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1721278066.591546
...
Troubleshooting steps I’ve tried so far:
- Verified that the endpoint is accessible from the Prometheus server.(from inside and outside)
- Ensured there are no firewall rules blocking traffic between Prometheus and the microservice.
- Checked Docker logs for any errors, but found none related to the
/health_metrics
endpoint.
Questions for the community:
- What could be causing Prometheus to return a
503 Service Unavailable
error when the endpoint is accessible via a browser? - Are there specific configurations in Prometheus or Docker that might need adjustment to resolve this issue?
- Could there be any resource constraints or timeout settings causing this behavior?
- Is there a way to get more detailed logging from Prometheus to diagnose the root cause of this issue?
I tested this issue on Windows for my .NET Core microservices, and this error occurs there as well. I created a code using Golang to get this content from /health_metrics for Prometheus and shown /metric in another port, and it worked, which makes me think the delay might be the problem.
Any insights or suggestions would be greatly appreciated!