All labs
Lab 43
Resilient AI Systems

Caching in an AI Agent — Every Layer

Where does caching live in an agent? At every layer — the browser, the API gateway, a semantic cache before the LLM, the tool result, and the database. Toggle a cache at each layer and watch where the request gets served, the cost and latency collapse, and why caching earlier wins. Then change the upstream data and meet caching's one danger: an agent that confidently serves a stale answer, and the TTL that bounds it.

The contract-risk agent answers the same questions a hundred ways, and every layer it touches costs money and time. Switch on a cache at each layer and watch where the request gets served — and how much it saves. Then change the upstream data and see caching's one danger.
Cost / request
$0.014
full price
Latency
1.90 s
Hits the LLM?
yes
paying for the model
Answer
fresh
up to date
Request path · toggle a cache at any layer
👤 User asks the agent
🧠 LLM + 🛠️ tools + 🗄️ DB
the slow, paid origin
The catch — staleness
What just happened