Latency: Understanding, Perceiving, and Mastering an Invisible Delay

Posted on January 28, 2026 • 6 min read • 1,150 words

Latency Performance Architecture Systems Computing Helene

Share via

Link copied to clipboard

Latency is often confused with slowness or lack of performance. This article provides a complete and professional view of latency: general and technical definitions, user perception, architectural trade-offs, and strategies to master it.

On this page

Latency: Understanding, Perceiving, and Mastering an Invisible Delay — Photo by Helene Hemmerter

Latency is one of those ubiquitous terms in computing, often used to explain a negative feeling — “it lags”, “it’s slow” — without its real meaning being clearly understood.
Yet latency is neither a bug nor a simple performance issue: it is a structural constraint of modern computer systems.

This article offers a comprehensive view of latency:

what it is from a general perspective,
its technical definition,
how it is perceived by users,
the trade-offs it imposes,
and above all, how to master it architecturally.

I. What Is Latency?

1. General definition

In its simplest sense, latency refers to the waiting time between an action and the system’s first reaction.

A user clicks, presses a key, or triggers an action; the system takes some time before it begins to respond.
That delay — even if very short — is latency.

It is essential to understand that latency does not describe the total duration of processing, but the time before something starts to happen.

2. Technical definition of latency

From a technical standpoint, latency is the time elapsed between the emission of a request and the reception of the first usable response, along the system’s critical path.

It is the sum of several components:

compute latency: CPU, memory access, cache,
I/O latency: disk, network, system calls,
coordination latency: locks, synchronization, consensus,
composition latency: chained calls, serialization, deserialization.

In modern systems, I/O and coordination largely dominate the cost of pure computation.

Overall latency is therefore never that of a single component, but that of the longest blocking path.

II. Latency, throughput, and performance: distinct concepts

A common mistake is to equate latency with performance.
A system may be capable of handling a very high number of requests per second while still exhibiting high latency.

Latency measures a delay, whereas throughput measures a processing capacity.

Increasing raw power or parallelism does not mechanically reduce latency, and can even worsen it as the system approaches saturation.

III. How latency is perceived by users

Henri Bergson

“Lived time is not the time of clocks.”

Latency is first and foremost a perceptual phenomenon, long before it is a technical metric.
What users experience is not the time measured by the system, but the subjective time of waiting — the delay between their intention and the confirmation that this intention has been acknowledged.

The human brain is particularly sensitive to feedback delay:

below roughly 100 ms, the response is perceived as instantaneous,
between 100 and 300 ms, the wait becomes noticeable but generally acceptable,
beyond one second, the interaction flow is broken and frustration emerges.

However, these thresholds are not absolute. A stable, predictable, and explained latency is often better tolerated than a lower but irregular one. What most degrades the experience is not waiting itself, but the uncertainty it creates.

When the system provides no immediate feedback, users begin to doubt: was the action understood? Should it be repeated? Is the system frozen?
At that point, latency ceases to be a simple delay and becomes a break in trust.

This is why it is essential to distinguish actual latency from perceived latency. A technically fast system may feel slow if it does not communicate, while a slower system may feel responsive if it clearly guides users during the wait.

From this perspective, latency is not only an optimization problem: it becomes a global design concern, at the intersection of software architecture, ergonomics, and cognitive psychology.

IV. Latency as a cost of modern architecture

In distributed architectures, latency is inevitable.

Each technical boundary introduces a cost:

network calls,
serialization,
authentication,
error handling,
timeouts.

Distribution brings many benefits — scalability, resilience, team independence — but it has a clear price: latency.

Latency cannot be eliminated, only reduced, shifted, or consciously accepted.

A concrete example: when a few milliseconds become a business cost

A well-known example in the industry comes from an internal Amazon study reported by Greg Linden in 2006.

In the late 2000s, teams observed that adding roughly 100 milliseconds of latency to the loading time of certain pages resulted in a measurable drop in conversion rate, on the order of 1%.
At Amazon’s scale, this represented hundreds of millions of dollars in potential revenue.

These measurements, obtained through controlled experiments (A/B tests), deeply influenced internal architectural decisions.
Latency was no longer viewed as a purely technical indicator, but as a direct economic variable.

In this context, every new boundary introduced into the architecture — an additional microservice, a network call, an abstraction layer — had to explicitly justify its impact on the user request’s critical path.
The central question became: what is the cost of this decision in milliseconds, and is that cost acceptable given the value it provides?

This case illustrates a fundamental point: in a distributed architecture, latency does not suddenly appear as a bug.
It accumulates progressively, often through small, invisible additions, until it becomes noticeable — and then harmful.

For this reason, latency cannot be treated as an implementation detail.
It must be considered an architectural constraint from the design phase, just like security, resilience, or scalability.

Latency is tightly coupled with other fundamental system properties.

Reducing latency often implies:

less immediate consistency,
more asynchrony,
increased application complexity.

Conversely, guaranteeing strong consistency or strict transactions leads to:

more synchronization,
more coordination,
and therefore more latency.

There is no universal solution: each system must explicitly state the trade-offs it accepts.

VI. Architecturally mastering latency

A professional approach to latency relies on several structural principles.

First, it is essential to trace the end-to-end critical path.
Optimizing a component outside that path has no impact on overall latency.

Second, it is necessary to reduce synchronous calls and parallelize independent dependencies.
Two 50 ms operations executed in parallel cost 50 ms, not 100.

It is also important to move computation closer to the data, limit network traversals, and reduce unnecessary coordination.

Finally, measurement must focus on high percentiles (P95, P99), because these are what users actually experience.

VII. Actual latency, perceived latency, and system design

In practice, the goal is not always to eliminate latency, but to make it acceptable.

This involves:

immediate feedback,
progressive responses,
predictable behavior.

A slightly slow but coherent and predictable system is often far better accepted than a fast but erratic one.

VIII. Conclusion

Latency is neither a minor technical detail nor an accidental defect.
It is a fundamental property of computer systems, directly tied to architectural, consistency, and distribution choices.

Mastering latency does not mean piling up local optimizations, but:

understanding the critical path,
accepting explicit trade-offs,
designing architectures that are aware of the cost of waiting.

Effective systems are not those that compute the fastest,
but those that intelligently minimize the time during which they do not respond.

Useful links

Entering the Cloud: The Gates of Hell

AI, Code, and Design: Why the Most Important Thing Hasn’t Changed

We work with you!