How We Deliver Story to a Factory Agent

Everyone talks about RAG as if it solves the context problem.

Take the documents. Chunk them. Embed them. Retrieve the relevant pieces. Give them to the model.

That helps. It is also not enough for a real operational system.

The reason is simple: context is not one thing.

When people say “give the model more context,” they usually collapse together several very different substances:

live data
stable system facts
tribal knowledge
language rules
question-specific grounding
historical memory
and the thing most people barely model at all: story

In a factory environment, these are not interchangeable.

If you treat them as one big retrieval problem, the model gets words. It does not necessarily get the right contract with reality.

Retrieval is one delivery primitive, not the whole architecture

We are not anti-retrieval.

Retrieval is useful whenever the model needs to pull the right knowledge slice at the right moment.

But retrieval alone does not decide:

what should always be present
what should be interpreted as stable truth versus current observation
what should be delivered as measured fact versus human rule
which question families deserve their own interface to the truth
when many facts should become one operational story instead of many separate facts

That is why we think about the problem as context delivery, not only context retrieval.

The interesting question is not:

“How do I fetch more relevant text?”

It is:

“What shape of truth should this runtime receive for this kind of question?”

That is a much more productive design question.

In Factory Agent, we do not deliver one context blob

The Factory Agent stack we are building for a live production client uses several different context forms on purpose.

Not because complexity is impressive. Because different truths need different delivery contracts.

At the highest level, we separate at least five layers:

declarative factory identity and operating rules
tribal knowledge and language knowledge
bounded live operational snapshots
canonical conversational contracts
living story objects

Once you see the system this way, “just use RAG” starts sounding too flat.

Layer 1: Declarative identity is not retrieval material

Some truths are so central that they should not depend on search at all.

Factory identity, process stages, machine labels, shift structure, goals, baselines, and watchdog thresholds belong in a declarative layer.

That layer answers questions like:

what is this factory
what are the named stages
which machines exist
what shift windows are currently configured
what goals and baselines shape interpretation

This is not “memory” in the fuzzy sense. It is system identity.

If the model has to rediscover those facts via retrieval every time, you are already paying the wrong tax.

Some truths should arrive as configuration, not as search results.

Layer 2: Tribal knowledge is not just more text

The next layer is the one most people oversimplify.

Factories run on a huge amount of knowledge that is not present in the raw telemetry:

what people actually call things
which machine behavior is real versus misleading
which bottleneck claims are safe and which are too strong
what managers care about
what blind spots exist today
what the process means in practice

This is not raw data. It is operational knowledge.

We keep a lot of this in human-readable markdown files and structured tribal-knowledge records, but the key point is not the storage format.

The key point is that this layer is authored and reviewed as knowledge.

It has categories, provenance, priorities, corrections, and even tombstones for retired rules.

That is already very different from “throw documents in a vector store and hope retrieval finds the right paragraph.”

Layer 3: Live data should arrive as bounded snapshots, not raw sprawl

The model should not normally read the factory like a tourist wandering through raw tables.

Instead, the runtime builds bounded operational bundles:

recent sorting state
recent packing state
machine stop patterns
active shift context
active ERP miyun context
merged factory context
signal health
blind spots
confidence penalties when important sources are missing

This matters because the live question is rarely “what rows exist?”

The real question is closer to:

what is happening now
which signals are trustworthy
what is missing
how much confidence should we have in the picture

That is a snapshot problem, not a retrieval problem.

Layer 4: Different question families deserve different contracts

One of the biggest mistakes in AI products is assuming the model should see a giant bag of tools and figure out the right path every turn.

We prefer a smaller interface:

current factory overview
machine situation
sorting snapshot
packing snapshot
flow constraint picture
shift change summary
shift comparison summary

Each of these is a canonical conversational contract.

That means:

a defined question family
a defined scope
a bounded grounded source set
explicit honesty rules
a delivery shape designed for that family

This is important because “what is happening now in the factory?” and “why is machine 5 slow?” and “what changed since morning?” are not the same question wearing different clothes.

They should not all be answered by the same generic retrieval routine.

Routing matters. Question-specific delivery matters. The model should not improvise the ontology on every turn.

Layer 5: Story is not a summary. It is a first-class object.

This is the part I find most interesting.

We are not only storing facts and knowledge. We are also storing stories.

Not fiction. Operational stories.

For example:

one machine stop that persists over time
one sustained rate gap
one evolving factory issue that should update instead of appearing as many separate cards

A good story object carries distinct layers:

measured facts
operational context
interpretation
first thing to check
confidence and blind spots

And it has identity and lifecycle:

open
updated
resolved
reopened

That is a completely different object than a retrieved paragraph or a raw event row.

It is closer to a living operational narrative unit.

Why does that matter?

Because real users do not think in isolated measurements. They think in issues:

this machine has been down for twenty minutes
this line is still under rate
this seems upstream, not local
this is the same story as before, but it got worse

If the system only stores raw data, the model has to recreate story every time. If the system stores only prose, the story drifts away from measurement.

The better pattern is to build story objects that remain tied to facts while behaving like evolving operational realities.

The feed, the chat, and the analyst should not eat the same context the same way

Another useful consequence of this design is that not every runtime consumes the same truth in the same form.

The watchdog wants cheap bounded signals. The analyst wants evidence bundles, recent events, recent snapshots, process model, and knowledge docs. The conversationalist wants curated grounding plus canonical contracts. The feed wants canonical surviving stories, not endpoint sprawl.

This is the opposite of the “one context pack for everything” instinct.

And it is usually better.

Different runtimes are good at different jobs. So they should receive different shapes of context.

Story is how raw measurement becomes usable operational knowledge

I think this is the deeper point.

Data and knowledge are not enough by themselves.

A system can have excellent live data and still be hard to work with. It can have rich knowledge files and still answer in a way that feels flat or generic.

What makes the system useful is often the layer in between:

the conversion from measured state plus knowledge plus uncertainty into an operational story that a human can act on.

That is why we care so much about:

separating fact from interpretation
preserving blind spots
storing stable story identity
and delivering the right contract for the right question family

Without that, the model is always rebuilding the world from scratch.

With it, the system starts to behave more like a colleague with the right bounded understanding.

The point is not to win a RAG argument

This is not really a post against RAG.

It is a post against flattening everything into the same mental model.

Retrieval is useful. Embeddings are useful. Search is useful.

But operational AI systems need a richer grammar than “documents in, chunks out.”

They need to decide:

what should be configured
what should be curated
what should be retrieved
what should be snapped
what should be routed
and what should be remembered as story

That is the level where the system starts becoming trustworthy.

Not when it can quote the right paragraph. When it can receive the right shape of truth.