How We Deliver Story to a Factory Agent
Retrieval is useful, but operational AI systems need more than a bag of context. In Factory Agent, we separate stable identity, tribal knowledge, bounded snapshots, conversational contracts, and living story objects so the model receives the right shape of truth for the job.
Everyone talks about RAG as if it solves the context problem.
Take the documents. Chunk them. Embed them. Retrieve the relevant pieces. Give them to the model.
That helps. It is also not enough for a real operational system.
The reason is simple: context is not one thing.
When people say “give the model more context,” they usually collapse together several very different substances:
- live data
- stable system facts
- tribal knowledge
- language rules
- question-specific grounding
- historical memory
- and the thing most people barely model at all: story
In a factory environment, these are not interchangeable.
If you treat them as one big retrieval problem, the model gets words. It does not necessarily get the right contract with reality.
Retrieval is one delivery primitive, not the whole architecture
We are not anti-retrieval.
Retrieval is useful whenever the model needs to pull the right knowledge slice at the right moment.
But retrieval alone does not decide:
- what should always be present
- what should be interpreted as stable truth versus current observation
- what should be delivered as measured fact versus human rule
- which question families deserve their own interface to the truth
- when many facts should become one operational story instead of many separate facts
That is why we think about the problem as context delivery, not only context retrieval.
The interesting question is not:
“How do I fetch more relevant text?”
It is:
“What shape of truth should this runtime receive for this kind of question?”
That is a much more productive design question.
In Factory Agent, we do not deliver one context blob
The Factory Agent stack we are building for Gezer Shalit uses several different context forms on purpose.
Not because complexity is impressive. Because different truths need different delivery contracts.
At the highest level, we separate at least five layers:
- declarative factory identity and operating rules
- tribal knowledge and language knowledge
- bounded live operational snapshots
- canonical conversational contracts
- living story objects
Once you see the system this way, “just use RAG” starts sounding too flat.
Layer 1: Declarative identity is not retrieval material
Some truths are so central that they should not depend on search at all.
Factory identity, process stages, machine labels, shift structure, goals, baselines, and watchdog thresholds belong in a declarative layer.
That layer answers questions like:
- what is this factory
- what are the named stages
- which machines exist
- what shift windows are currently configured
- what goals and baselines shape interpretation
This is not “memory” in the fuzzy sense. It is system identity.
If the model has to rediscover those facts via retrieval every time, you are already paying the wrong tax.
Some truths should arrive as configuration, not as search results.
Layer 2: Tribal knowledge is not just more text
The next layer is the one most people oversimplify.
Factories run on a huge amount of knowledge that is not present in the raw telemetry:
- what people actually call things
- which machine behavior is real versus misleading
- which bottleneck claims are safe and which are too strong
- what managers care about
- what blind spots exist today
- what the process means in practice
This is not raw data. It is operational knowledge.
We keep a lot of this in human-readable markdown files and structured tribal-knowledge records, but the key point is not the storage format.
The key point is that this layer is authored and reviewed as knowledge.
It has categories, provenance, priorities, corrections, and even tombstones for retired rules.
That is already very different from “throw documents in a vector store and hope retrieval finds the right paragraph.”
Layer 3: Live data should arrive as bounded snapshots, not raw sprawl
The model should not normally read the factory like a tourist wandering through raw tables.
Instead, the runtime builds bounded operational bundles:
- recent sorting state
- recent packing state
- machine stop patterns
- active shift context
- active ERP miyun context
- merged factory context
- signal health
- blind spots
- confidence penalties when important sources are missing
This matters because the live question is rarely “what rows exist?”
The real question is closer to:
- what is happening now
- which signals are trustworthy
- what is missing
- how much confidence should we have in the picture
That is a snapshot problem, not a retrieval problem.
Layer 4: Different question families deserve different contracts
One of the biggest mistakes in AI products is assuming the model should see a giant bag of tools and figure out the right path every turn.
We prefer a smaller interface:
- current factory overview
- machine situation
- sorting snapshot
- packing snapshot
- flow constraint picture
- shift change summary
- shift comparison summary
Each of these is a canonical conversational contract.
That means:
- a defined question family
- a defined scope
- a bounded grounded source set
- explicit honesty rules
- a delivery shape designed for that family
This is important because “what is happening now in the factory?” and “why is machine 5 slow?” and “what changed since morning?” are not the same question wearing different clothes.
They should not all be answered by the same generic retrieval routine.
Routing matters. Question-specific delivery matters. The model should not improvise the ontology on every turn.
Layer 5: Story is not a summary. It is a first-class object.
This is the part I find most interesting.
We are not only storing facts and knowledge. We are also storing stories.
Not fiction. Operational stories.
For example:
- one machine stop that persists over time
- one sustained rate gap
- one evolving factory issue that should update instead of appearing as many separate cards
A good story object carries distinct layers:
- measured facts
- operational context
- interpretation
- first thing to check
- confidence and blind spots
And it has identity and lifecycle:
- open
- updated
- resolved
- reopened
That is a completely different object than a retrieved paragraph or a raw event row.
It is closer to a living operational narrative unit.
Why does that matter?
Because real users do not think in isolated measurements. They think in issues:
- this machine has been down for twenty minutes
- this line is still under rate
- this seems upstream, not local
- this is the same story as before, but it got worse
If the system only stores raw data, the model has to recreate story every time. If the system stores only prose, the story drifts away from measurement.
The better pattern is to build story objects that remain tied to facts while behaving like evolving operational realities.
The feed, the chat, and the analyst should not eat the same context the same way
Another useful consequence of this design is that not every runtime consumes the same truth in the same form.
The watchdog wants cheap bounded signals. The analyst wants evidence bundles, recent events, recent snapshots, process model, and knowledge docs. The conversationalist wants curated grounding plus canonical contracts. The feed wants canonical surviving stories, not endpoint sprawl.
This is the opposite of the “one context pack for everything” instinct.
And it is usually better.
Different runtimes are good at different jobs. So they should receive different shapes of context.
Story is how raw measurement becomes usable operational knowledge
I think this is the deeper point.
Data and knowledge are not enough by themselves.
A system can have excellent live data and still be hard to work with. It can have rich knowledge files and still answer in a way that feels flat or generic.
What makes the system useful is often the layer in between:
the conversion from measured state plus knowledge plus uncertainty into an operational story that a human can act on.
That is why we care so much about:
- separating fact from interpretation
- preserving blind spots
- storing stable story identity
- and delivering the right contract for the right question family
Without that, the model is always rebuilding the world from scratch.
With it, the system starts to behave more like a colleague with the right bounded understanding.
The point is not to win a RAG argument
This is not really a post against RAG.
It is a post against flattening everything into the same mental model.
Retrieval is useful. Embeddings are useful. Search is useful.
But operational AI systems need a richer grammar than “documents in, chunks out.”
They need to decide:
- what should be configured
- what should be curated
- what should be retrieved
- what should be snapped
- what should be routed
- and what should be remembered as story
That is the level where the system starts becoming trustworthy.
Not when it can quote the right paragraph. When it can receive the right shape of truth.