Consumer-Reported Dependency Health
Posted on December 8, 2025 • 5 min read • 856 wordsAn in-depth exploration of the CRDH practice, where consumers become distributed probes that report the real health of their dependencies. A modern and reliable approach for monitoring, observability, and incident detection.

In modern distributed architectures, a system’s health depends as much — if not more — on the state of its dependencies as on its own internal state. Yet most monitoring strategies still rely on synthetic or dedicated healthchecks: /health endpoints, liveness/readiness probes, external scripts, and similar mechanisms.
These techniques work, but they miss the essential point:
the real experience of the consumers.
There is a simpler, more robust, naturally distributed alternative:
Consumer-Reported Dependency Health (CRDH).
Consumer-Reported Dependency Health (CRDH) means that consumers themselves indicate whether a dependency is functioning properly, based directly on what they observe during their real calls.
The principle:
In practice, this creates a distributed health matrix that instantly reveals whether a problem is:
A /health endpoint often checks “SELECT 1”, “PING Redis”, or “GET /status”.
But real business traffic is far more complex (permissions, payloads, batching, etc.).
A service may appear “healthy” in a healthcheck but be unusable in reality.
A consumer may fail because of:
The dependency’s healthcheck will still show “OK”.
Each new service must:
CRDH eliminates this burden.
Thanks to CRDH metrics, a global view emerges:
| Dependency | Consumer A | Consumer B | Consumer C | Global Status |
|---|---|---|---|---|
| Redis | FAIL | FAIL | FAIL | ❌ Global outage |
| Payment API | OK | FAIL | OK | ⚠️ Local issue (B) |
| S3 | OK | OK | OK | ✓ Healthy |
This is an extremely powerful tool for:
crdh_dependency_success_total{
consumer="order-service",
dependency="payment-api",
method="POST",
status="200"
}crdh_dependency_error_total{
consumer="order-service",
dependency="payment-api",
error="timeout",
status="504"
}crdh_dependency_latency_ms_bucket{
consumer="web",
dependency="db",
le="100"
}(always calculated with PromQL)
100 * sum(rate(crdh_dependency_success_total[5m]))
/
sum(rate(crdh_dependency_success_total[5m]) + rate(crdh_dependency_error_total[5m]))func CallPaymentAPI(ctx context.Context) error {
start := time.Now()
err := doRealPaymentCall(ctx)
duration := time.Since(start)
labels := prometheus.Labels{
"consumer": "order-service",
"dependency": "payment-api",
}
if err != nil {
CRDHErrors.With(labels).Inc()
} else {
CRDHSuccess.With(labels).Inc()
CRDHLatency.With(labels).Observe(duration.Seconds())
}
return err
}And that’s it. No dedicated healthcheck is required.
| Practice | Advantages | Limitations |
|---|---|---|
| Dedicated healthchecks | Simple to implement | Do not reflect real traffic |
| Synthetic checks | Great for external monitoring | Limited business context |
| Distributed tracing | Excellent granularity | Complex, requires heavy instrumentation |
| CRDH | Realistic, scalable, simple, self-sustained | Traffic-based (requires minimal volume) |
CRDH is not a replacement but a natural complement:
it adds the business context missing from traditional probes.
sum(rate(crdh_dependency_error_total{consumer="serviceA", dependency="redis"}[5m])) > 5count by (dependency) (
sum(rate(crdh_dependency_error_total[5m])) by (consumer, dependency) > 5
) > 2histogram_quantile(0.95, sum by (le, consumer, dependency) (rate(crdh_dependency_latency_ms_bucket[5m]))) > 200CRDH is particularly effective when:
| Limitation | Solution |
|---|---|
| Low traffic → low metric quality | Add light synthetic probing |
| Unused dependencies remain invisible | Add weak “lightweight probes” |
| High cardinality due to too many labels | Standardize consumer & dependency |
CRDH assumes that developers understand the real behavior of their dependency calls: retries, timeouts, fallbacks, logical errors, and what constitutes true business success.
If not, CRDH metrics may become incorrect or misleading.
To avoid this, use a shared middleware or SDK that standardizes metric emission, and document clearly what counts as “success” and “failure” for each dependency.
Because it relies on a simple principle:
The best measure of a system’s health is the real experience of the services that use it.
CRDH turns all consumers into a distributed probe, free, realistic, and self-maintaining.
It is exactly the backend equivalent of user-reported health on the frontend —
but for microservices.
CRDH represents a powerful shift in perspective:
it is no longer the responsibility of services to “prove” that they are alive —
it is their consumers who report what they actually observe.
It is simple.
It is robust.
It reflects reality.
And it significantly improves how we detect, diagnose, and resolve incidents in distributed architectures.