Product · Jun 3, 2026 · 4 min read

Indirect prompt injection lives in your knowledge base

Khub Team · Jun 3, 2026

Prompt injection is usually discussed as something a user types. But agents do not only read what the user says. They read knowledge bases, documents, search results, and tool output, and every one of those is a channel someone can write to. When an agent reads an entry, that text becomes part of its reasoning. So a knowledge base is not a passive store. It is an instruction surface.

This is indirect prompt injection, and it is the version that is easy to miss, because the harmful text never comes from the person in the conversation. It comes from content the agent was told to trust.

Why the knowledge layer is where it lands

An attacker who can write to a base does not need access to your session. They need access to your content. A leaked key, an open ingestion path, or a poisoned page picked up by a scraper is enough. They plant instructions dressed as ordinary knowledge, the agent retrieves it as part of an answer, and it may act on it as though you had asked.

The planted text usually gives itself away in shape. It reads like a command rather than a fact. It refers to an updated protocol or an override file that does not exist, dressed up as authority the agent already trusts. And very often it asks the reader to do something quietly, to not mention it, to keep it between them.

That last one is the clearest signal of all. Legitimate knowledge never asks to be concealed. There is no benign reason for a stored entry to request secrecy or non-disclosure. When content asks to be hidden, it is almost always hostile, and it is one of the easiest hostile patterns to catch.

How to defend it

Defending this well comes down to one principle: treat retrieved content as data, never as commands, and make every change to that content visible. Khub's Trust Layer applies that in four places.

Reject poisoned content before it is stored

Every write is scanned before it lands. Content that tries to instruct an agent rather than inform it is rejected at the door, so it never becomes an answer. Requests for secrecy, the signal above, are treated as hostile on their own, because the false-positive cost is close to nothing and the value of catching them is high.

Sign every change so nothing moves unseen

Every change is recorded in an append-only log, bound to the actor who made it and to a hash of the content, and signed in a chain so that altering any past record breaks the chain. This is the same idea as a commit history for your knowledge. You get attribution for every change and the ability to roll any of them back, and you get it in a form that cannot be quietly rewritten.

Neutralise content on the way out

When knowledge leaves Khub, it is wrapped and marked as data, suspicious passages are defused, hidden characters are stripped, and anything hostile is withheld rather than served. This matters because you cannot rely on the consuming model to ignore an instruction it has already been handed. The safe move is to defuse the payload before it is delivered, not to hope it is disregarded after. Each answer also carries its provenance, so the agent and the operator can see how far to trust it.

Make tampering detectable

Each entry's current content is checked against the hash that was signed at its last recorded write. A match means the served content is exactly what was signed. A mismatch means the content changed outside the audited path, which is the signature of a direct database edit, and it is flagged on the very next read. A silent change becomes a visible one.

What this does not do

It is worth being precise, because precision is what makes a security claim worth anything. The Trust Layer cannot control what a model does with text once that text is delivered. Nothing at this layer can. The metadata that marks content as data is advisory. The scanning, neutralisation, and withholding are enforced. The goal is to shrink the attack surface to something small and visible, not to claim immunity. A defence you can describe exactly is one you can actually build on.

Where to start

The baseline, scanning every write, signing every change, and detecting tampering on read, is on every Khub plan, including the free tier. Business adds classifier-grade scanning, provenance tiers on every answer, protected collections that require review before a change publishes, and automatic holds on unusual write activity.

If you are running agents against knowledge that other people, services, or scrapers can write to, this is the layer that decides whether a poisoned entry becomes an incident or never lands at all.

Try Khub free, or see what Business adds.