Latency: Understanding, Perceiving, and Mastering an Invisible Delay
Posted on January 28, 2026 • 6 min read • 1,150 wordsLatency is often confused with slowness or lack of performance. This article provides a complete and professional view of latency: general and technical definitions, user perception, architectural trade-offs, and strategies to master it.

Latency is one of those ubiquitous terms in computing, often used to explain a negative feeling — “it lags”, “it’s slow” — without its real meaning being clearly understood.
Yet latency is neither a bug nor a simple performance issue: it is a structural constraint of modern computer systems.
This article offers a comprehensive view of latency:
In its simplest sense, latency refers to the waiting time between an action and the system’s first reaction.
A user clicks, presses a key, or triggers an action; the system takes some time before it begins to respond.
That delay — even if very short — is latency.
It is essential to understand that latency does not describe the total duration of processing, but the time before something starts to happen.
From a technical standpoint, latency is the time elapsed between the emission of a request and the reception of the first usable response, along the system’s critical path.
It is the sum of several components:
In modern systems, I/O and coordination largely dominate the cost of pure computation.
Overall latency is therefore never that of a single component, but that of the longest blocking path.
A common mistake is to equate latency with performance.
A system may be capable of handling a very high number of requests per second while still exhibiting high latency.
Latency measures a delay, whereas throughput measures a processing capacity.
Increasing raw power or parallelism does not mechanically reduce latency, and can even worsen it as the system approaches saturation.
Henri Bergson
“Lived time is not the time of clocks.”
Latency is first and foremost a perceptual phenomenon, long before it is a technical metric.
What users experience is not the time measured by the system, but the subjective time of waiting — the delay between their intention and the confirmation that this intention has been acknowledged.
The human brain is particularly sensitive to feedback delay:
However, these thresholds are not absolute. A stable, predictable, and explained latency is often better tolerated than a lower but irregular one. What most degrades the experience is not waiting itself, but the uncertainty it creates.
When the system provides no immediate feedback, users begin to doubt: was the action understood? Should it be repeated? Is the system frozen?
At that point, latency ceases to be a simple delay and becomes a break in trust.
This is why it is essential to distinguish actual latency from perceived latency. A technically fast system may feel slow if it does not communicate, while a slower system may feel responsive if it clearly guides users during the wait.
From this perspective, latency is not only an optimization problem: it becomes a global design concern, at the intersection of software architecture, ergonomics, and cognitive psychology.
In distributed architectures, latency is inevitable.
Each technical boundary introduces a cost:
Distribution brings many benefits — scalability, resilience, team independence — but it has a clear price: latency.
Latency cannot be eliminated, only reduced, shifted, or consciously accepted.
A well-known example in the industry comes from an internal Amazon study reported by Greg Linden in 2006.
In the late 2000s, teams observed that adding roughly 100 milliseconds of latency to the loading time of certain pages resulted in a measurable drop in conversion rate, on the order of 1%.
At Amazon’s scale, this represented hundreds of millions of dollars in potential revenue.
These measurements, obtained through controlled experiments (A/B tests), deeply influenced internal architectural decisions.
Latency was no longer viewed as a purely technical indicator, but as a direct economic variable.
In this context, every new boundary introduced into the architecture — an additional microservice, a network call, an abstraction layer — had to explicitly justify its impact on the user request’s critical path.
The central question became: what is the cost of this decision in milliseconds, and is that cost acceptable given the value it provides?
This case illustrates a fundamental point: in a distributed architecture, latency does not suddenly appear as a bug.
It accumulates progressively, often through small, invisible additions, until it becomes noticeable — and then harmful.
For this reason, latency cannot be treated as an implementation detail.
It must be considered an architectural constraint from the design phase, just like security, resilience, or scalability.
Latency is tightly coupled with other fundamental system properties.
Reducing latency often implies:
Conversely, guaranteeing strong consistency or strict transactions leads to:
There is no universal solution: each system must explicitly state the trade-offs it accepts.
A professional approach to latency relies on several structural principles.
First, it is essential to trace the end-to-end critical path.
Optimizing a component outside that path has no impact on overall latency.
Second, it is necessary to reduce synchronous calls and parallelize independent dependencies.
Two 50 ms operations executed in parallel cost 50 ms, not 100.
It is also important to move computation closer to the data, limit network traversals, and reduce unnecessary coordination.
Finally, measurement must focus on high percentiles (P95, P99), because these are what users actually experience.
In practice, the goal is not always to eliminate latency, but to make it acceptable.
This involves:
A slightly slow but coherent and predictable system is often far better accepted than a fast but erratic one.
Latency is neither a minor technical detail nor an accidental defect.
It is a fundamental property of computer systems, directly tied to architectural, consistency, and distribution choices.
Mastering latency does not mean piling up local optimizations, but:
Effective systems are not those that compute the fastest,
but those that intelligently minimize the time during which they do not respond.