LLMs are useful for maintaining a personal or team knowledge base, but only when the boundaries are explicit.
Without boundaries, the knowledge base turns into a blended layer of raw source, interpretation, outdated memory, and generated prose. It may look organized while becoming harder to trust.
The failure mode is subtle. The notes get cleaner while the system gets less reliable.
Separate source from synthesis
I like to keep three layers separate:
- Raw source material: emails, exports, meeting notes, documents, transcripts.
- Working wiki: synthesized notes that are useful for daily work.
- Public garden: selected notes rewritten for outside readers.
Each layer has a different job. Raw source should preserve evidence. The working wiki should support action. The public garden should communicate ideas without leaking private context.
The layers also have different permissions:
- sources should be append-only or read-only,
- wiki pages can be rewritten when supported by evidence,
- public notes should be opt-in and deliberately sanitized.
When those boundaries collapse, the agent can accidentally turn a private source into public prose, rewrite evidence instead of interpretation, or preserve a stale conclusion because the original source was no longer visible.
Generated notes need provenance
When an LLM creates or updates a note, the useful question is not only “does this sound right?”
Better questions:
- What source material supports this?
- What did the model infer?
- What was omitted?
- What may have changed since the source was written?
- Is this note safe to share?
The model can help answer these questions, but the system should make them easy to ask.
A good generated note should leave fingerprints:
- links to source notes,
- labels for inference versus confirmation,
- dates for time-sensitive claims,
- visible uncertainty where evidence is incomplete,
- a clear owner for human judgment.
This is less about citation aesthetics and more about future repair. When the note is wrong six months later, provenance tells you whether the error came from the source, the interpretation, or a change in reality.
Boundaries make automation safer
Automation becomes safer when the folders and rules match the risk level.
For example:
- Source folders are read-only inputs.
- Working wiki pages can be updated, but should preserve links to sources.
- Public notes are opt-in and curated.
- Sensitive documents are excluded from ingest and publish workflows.
The point is not to slow everything down. The point is to make the safe path obvious.
The same idea applies to agent memory. Not every useful fact belongs in always-loaded context. Not every source belongs in retrieval. Not every private conclusion belongs in a public essay.
The boundary is what lets the system grow without becoming an undifferentiated pile.
The minimum useful rule set
For a small knowledge base, I would start with rules like these:
- Raw sources are never edited by the agent.
- Generated summaries must link back to sources.
- Contradictions are flagged, not silently resolved.
- Public notes cannot link to private notes.
- Sensitive folders are excluded by default.
- Claims with dates or external facts are marked for later review.
- Humans own judgment; the model owns bookkeeping.
This is the smaller sibling of the three-layer pattern. The pattern describes the architecture. The boundary rules keep it trustworthy.
LLM-maintained notes are useful when they reduce entropy. They become dangerous when they hide it behind fluent prose.