What Should You Really Cache in a CI/CD Pipeline?

Posted on March 11, 2026 • 4 min read • 727 words

Share via

Link copied to clipboard

Adding cache to a CI/CD pipeline is not about storing random files. This article offers a pragmatic framework to understand what to cache, why it matters, and how to avoid a fragile or counterproductive cache.

On this page

What Should You Really Cache in a CI/CD Pipeline? — Photo by Helene Hemmerter

I. Introduction

When trying to speed up a CI/CD pipeline, the first idea that almost always comes up is:

“we need to add cache”

But very quickly, another question appears:
what exactly are we caching?

Files? Folders? Docker images? Dependencies?
And above all: which cache has a real impact, and which one just makes the system more complex?

This article offers a simple and pragmatic framework for answering that question.

II. The Classic Trap: Caching Random Files

Many pipelines start like this:

cache node_modules
cache .pnpm-store or .npm
cache build folders
sometimes even the entire workspace

The result:

it sometimes works
it often breaks
and nobody really knows why

The problem is not caching itself.
The problem is what we are trying to cache.

III. The Key Principle: We Do Not Cache Files, We Cache Work

A CI/CD pipeline is not a sequence of files, it is a sequence of tasks:

install dependencies
generate code
compile
test
package

Every useful cache corresponds to a well-defined task, with:

inputs
outputs

If the inputs have not changed, the work does not need to be done again.

That is the logic that should guide any effective caching strategy.

IV. The Main Categories of Cache (and Their Real Value)

1. Download Cache (the bare minimum)

It avoids downloading again what already exists. This should be done almost all the time.

Examples:

Node dependencies (pnpm / npm / yarn)
Go modules
Foundry / Solidity dependencies
Maven / Gradle dependencies

Value

immediate gain
low risk
easy to set up

Limitation

does not eliminate build work
only removes network cost

2. Generation Cache (codegen, intermediate artifacts)

This is an excellent candidate, often overlooked, and yet highly valuable.

Examples:

GraphQL generation
binding generation
contract generation
code produced by tools

Value

tasks are often deterministic
expensive to repeat
ideal for caching

Key condition

inputs must be clearly identified
(schemas, source files, tool versions)

3. Build Cache (compilation)

This is where things become interesting… and delicate.

Examples:

TypeScript build
Go build
Solidity build
frontend build

Value

potentially huge gains
major CI time reduction

Risk

if cache invalidation is wrong, you cache bugs
if it is too coarse, it becomes unusable

Good practice

cache by logical unit (project, package, service)
avoid a “global build cache”

It is very useful, but it requires discipline.

4. Test Cache

Often counterintuitive, but sometimes relevant. It should be used carefully.

Examples:

purely deterministic unit tests
tests based only on source code

Value

huge for long test suites
greatly improves the feedback loop

Caution

tests depending on time, order, or environment → poor candidates
integration tests → generally no

5. Docker Cache (useful, but misunderstood)

Docker cache is linear. It is excellent for packaging, but poor as the main cache for application logic:

a change in one layer invalidates all the following ones
it does not understand the notion of a “project” or a “task”

What Docker caches well

system dependencies
packaging steps
reproducible images

What Docker cannot do

reuse a specific application build
understand that one service is unaffected

V. The Breakdown You Should Aim For

An effective pipeline caches:

Type of work	Recommended cache
Downloading	Yes
Code generation	Yes
Per-project build	Yes
Deterministic tests	Sometimes
Docker images	Yes (but not alone)

Most importantly: each cache should correspond to an explicit task, not an arbitrary folder.

VI. Why So Many Pipelines Become Unmanageable

Because they stack:

a CI cache
a Docker cache
ad hoc scripts
implicit rules

Without ever answering the fundamental question:

What work am I trying to avoid doing again?

When that answer is not clear, the cache becomes:

fragile
poorly understood
and quickly disabled “temporarily”… forever

VII. The Right Success Indicator

A good CI/CD cache has one simple property:

A developer can predict what will be reused without reading the CI configuration.

If that is not the case, the cache is too implicit.

VIII. Conclusion

Adding cache to a CI/CD pipeline is not a tooling question.
It is a matter of modeling work.

Cache what is expensive
Cache what is deterministic
Cache what is clearly bounded
Avoid caching what you cannot explain

The tools come after that. Always.

🔗 Useful links

Why Is Docker Cache Insufficient for a Monorepo?

How to create a security group that allows only traffic coming from CloudFront?

We work with you!