Skip to main content

Metrics Reference

Angos exposes Prometheus metrics at the /metrics endpoint.


HTTP Metrics

http_requests_total

Total number of HTTP requests.

TypeLabels
Countermethod, route, status

Labels:

  • method: HTTP method (GET, POST, PUT, DELETE, etc.)
  • route: Route action (e.g., get-manifest, put-blob, list-tags)
  • status: HTTP status code (200, 404, 500, etc.)

Example:

# Request rate over 5 minutes
rate(http_requests_total[5m])

# Error rate (5xx responses)
rate(http_requests_total{status=~"5.."}[5m])

# Requests by route
sum by (route) (rate(http_requests_total[5m]))

# GET requests for manifests
rate(http_requests_total{method="GET", route="get-manifest"}[5m])

http_request_duration_ms

HTTP request latency in milliseconds.

TypeLabels
Histogrammethod, route

Example:

# 95th percentile latency
histogram_quantile(0.95, rate(http_request_duration_ms_bucket[5m]))

# Average latency
rate(http_request_duration_ms_sum[5m]) / rate(http_request_duration_ms_count[5m])

# Latency by route
histogram_quantile(0.99, sum by (route, le) (rate(http_request_duration_ms_bucket[5m])))

# Manifest pull latency
histogram_quantile(0.95, rate(http_request_duration_ms_bucket{route="get-manifest"}[5m]))

http_requests_in_flight

Current number of HTTP requests being processed.

TypeLabels
Gaugenone

Example:

# Current in-flight requests
http_requests_in_flight

# Max in-flight over time
max_over_time(http_requests_in_flight[1h])

Route Values

The route label uses action names from the OCI Distribution API:

RouteDescription
healthzHealth check
metricsPrometheus metrics
get-api-versionAPI version check
get-blobDownload blob
delete-blobDelete blob
start-uploadStart blob upload
update-uploadChunk upload
complete-uploadComplete upload
get-uploadUpload status
cancel-uploadCancel upload
get-manifestPull manifest
put-manifestPush manifest
delete-manifestDelete manifest
list-tagsList tags
list-catalogList repositories
get-referrersGet referrers
ui-assetUI static files
ui-configUI configuration
list-repositoriesExtension API
list-namespacesExtension API
list-revisionsExtension API
list-uploadsExtension API
unknownUnrecognized route

Authentication Metrics

auth_attempts_total

Total number of authentication attempts.

TypeLabels
Countermethod, result

Labels:

  • method: basic, mtls, oidc
  • result: success, failed

Example:

# Authentication success rate
sum(rate(auth_attempts_total{result="success"}[5m])) /
sum(rate(auth_attempts_total[5m]))

# Failed auth attempts by method
sum by (method) (rate(auth_attempts_total{result="failed"}[5m]))

Webhook Metrics

webhook_authorization_requests_total

Total webhook authorization requests.

TypeLabels
Counterwebhook, result

Labels:

  • webhook: Name of the webhook
  • result: allow, deny, cached_allow, cached_deny

Example:

# Webhook hit rate
sum by (webhook) (rate(webhook_authorization_requests_total[5m]))

# Cache effectiveness
sum(rate(webhook_authorization_requests_total{result=~"cached_.*"}[5m])) /
sum(rate(webhook_authorization_requests_total[5m]))

# Denial rate by webhook
sum by (webhook) (rate(webhook_authorization_requests_total{result=~".*deny"}[5m]))

webhook_authorization_duration_seconds

Webhook authorization request duration.

TypeLabels
Histogramwebhook

Example:

# 95th percentile webhook latency
histogram_quantile(0.95, rate(webhook_authorization_duration_seconds_bucket[5m]))

# Slow webhook detection (> 1s)
rate(webhook_authorization_duration_seconds_bucket{le="1"}[5m])

Example Prometheus Configuration

scrape_configs:
- job_name: 'angos'
static_configs:
- targets: ['registry:5000']
metrics_path: /metrics
scheme: http # or https

Example Grafana Dashboard Queries

Overview

# Request rate
sum(rate(http_requests_total[5m]))

# Error rate percentage
100 * sum(rate(http_requests_total{status=~"5.."}[5m])) /
sum(rate(http_requests_total[5m]))

# P95 latency
histogram_quantile(0.95, sum(rate(http_request_duration_ms_bucket[5m])) by (le))

# Request rate by route
sum by (route) (rate(http_requests_total[5m]))

# Manifest operations latency
histogram_quantile(0.95, sum(rate(http_request_duration_ms_bucket{route=~".*-manifest"}[5m])) by (le))

Authentication

# Auth success rate
100 * sum(rate(auth_attempts_total{result="success"}[5m])) /
sum(rate(auth_attempts_total[5m]))

# Auth method distribution
sum by (method) (rate(auth_attempts_total[5m]))

Webhooks

# Webhook cache hit rate
100 * sum(rate(webhook_authorization_requests_total{result=~"cached_.*"}[5m])) /
sum(rate(webhook_authorization_requests_total[5m]))

# Webhook error rate (denials)
100 * sum(rate(webhook_authorization_requests_total{result=~".*deny"}[5m])) /
sum(rate(webhook_authorization_requests_total[5m]))

Alerting Examples

groups:
- name: angos
rules:
- alert: HighErrorRate
expr: |
sum(rate(http_requests_total{status=~"5.."}[5m])) /
sum(rate(http_requests_total[5m])) > 0.05
for: 5m
labels:
severity: warning
annotations:
summary: "High error rate on Angos"

- alert: HighLatency
expr: |
histogram_quantile(0.95, sum(rate(http_request_duration_ms_bucket[5m])) by (le)) > 1000
for: 5m
labels:
severity: warning
annotations:
summary: "High latency on Angos"

- alert: AuthFailures
expr: |
sum(rate(auth_attempts_total{result="failed"}[5m])) > 10
for: 5m
labels:
severity: warning
annotations:
summary: "High authentication failure rate"