Skip to content

rafcip/lore

Repository files navigation

lore — knowledge your AI agents can act on

Turn your team's hard-won experience into knowledge your AI agents can act on. Institutional memory that doesn't die with the person who earned it — captured, governed, and made actionable for LLM agents.

By Raffaele Cipro — Principal Product Manager, OpenText · enterprise & agentic AI. A practitioner field report, not a maintained framework: no SDK to install, no support promised. No domain data here — just the architecture and the methodology. (LinkedIn)

Most "knowledge bases for agents" are a pile of documents wired to vector search. But the knowledge that actually wins the work usually isn't in the documents — it's in the heads of the people who know how the work is really done, and it walks out the door when they do. lore is a methodology for capturing that experience — codified facts and the practitioner's craft — and turning it into knowledge an agent can retrieve, navigate, and apply reliably. It comes from running such a system in production in a regulated professional domain.

Three ideas do the work: route by question type across cooperating layers, restructure long documents instead of chunking them (Book-to-Skill), and govern reliability so "is this still true, who validated it" is answerable — including for the tacit, experiential knowledge that has no external source to check against (how experience becomes agent-ready).

Take the ideas, adapt the schemas, ignore what doesn't fit.


Where this sits relative to the Open Knowledge Format (OKF)

In June 2026 Google Cloud published the Open Knowledge Format (OKF) — a vendor-neutral standard that represents knowledge as a directory of markdown files with YAML frontmatter, formalizing the "LLM wiki" pattern popularized by Andrej Karpathy. OKF standardizes the substrate: how a single knowledge file is shaped and how files link.

This methodology is complementary and sits above that substrate. OKF answers "what does one knowledge file look like, and how do files reference each other?" The patterns here answer the next questions:

  • How do you route a query to the right kind of knowledge (structured lookup vs. thematic navigation vs. verbatim source)?
  • How do you turn a long reference document into something an agent navigates deterministically instead of chunking it badly (Book-to-Skill)?
  • How do you govern reliability — what's verified, what's contested, what's superseded — across the whole corpus (LLM Wiki governance)?
  • How do you capture tacit expertise (the practitioner's craft), not just codified facts?

If you adopt OKF for your files, the patterns in methodology/ tell you what to build on top of them. They predate OKF in this system but map onto it cleanly.


The three jobs, three layers

Naive RAG over a pile of PDFs is easy to demo and expensive to operate: chunk roulette, non-determinism where you want a single right answer, and no governance layer for "is this still true?". The core move is to stop asking one vector index to be good at everything, and separate three jobs into three cooperating layers:

                ┌──────────────────────────────────────────────┐
   agent asks   │  router: which kind of question is this?     │
  ────────────► │   structured?   thematic?   verbatim?        │
                └───────┬───────────────┬───────────────┬──────┘
                        ▼               ▼               ▼
                ┌─────────────┐ ┌───────────────┐ ┌──────────────┐
                │ L1: DB +    │ │ L2: Skill     │ │ L3: Files    │
                │ vector      │ │ files         │ │ (source of   │
                │ (hybrid)    │ │ (book-to-     │ │  truth)      │
                │             │ │  skill)       │ │              │
                └─────────────┘ └───────────────┘ └──────────────┘
  • L1 — Structured store + vectors (hybrid search). Relational DB + a vector extension, exposed to agents through a small set of typed tools (e.g. an MCP server), not raw SQL. Answers "how many of X?", "look up Y", ranked semantic search with a score you can threshold on.
  • L2 — Skill files (navigable knowledge). Long documents are restructured, not chunked, into a small file tree the agent navigates deterministically. See Book-to-Skill.
  • L3 — Filesystem (verbatim source of truth). Original documents as plain files with YAML frontmatter. Everything upstream is a rendering of this layer; this is what you cite.

The hard part — capturing the practitioner's craft

The three layers handle documents and facts. But the knowledge that actually wins the work is often tacit: how an expert sequences an argument, what to concede, which move backfires. It has no external source to verify against — and it normally dies with the person who holds it.

lore treats that experiential knowledge as a first-class, governed stream: captured as claims that start unverified and become trusted only once the expert validates them, and kept honest by an adherence loop that surfaces when an agent's output drifts from the captured craft. This is what turns "we knew that" into "the system knows that" — and it's what the whole methodology is ultimately for.

Capturing the practitioner's craft as a governed, applicable knowledge stream


Adopt it

Want to use this in your own system? → GETTING-STARTED.md — a 5-step adoption checklist, with:

  • templates/ — copy-paste skeletons: source frontmatter, SKILL.md, chapters, the claims schema, and Layer-1 typed-tool stubs (templates/mcp-tools/: hybrid-search schema.sql + tools.py).
  • starter-example/ — a complete, navigable mini-KB (synthetic domain) showing all three layers working together for one document.

Read in this order

Doc What it covers
methodology/01-three-layer-kb.md The router and the three layers; how they cooperate at ingestion ("triple")
methodology/02-book-to-skill.md Turning long documents into navigable, progressive-disclosure skill bundles
methodology/03-llm-wiki-governance.md Claims, reliability, and deterministic dashboards: from human memory to institutional memory
methodology/04-tacit-expertise-as-knowledge.md Capturing the practitioner's craft as a governed, applicable knowledge stream
reference/frontmatter-schema.md Frontmatter conventions and a worked, synthetic example
GETTING-STARTED.md · templates/ · starter-example/ Adoption checklist, copy-paste templates, and a runnable-shaped worked example

Design principles that generalized well

  • Route by question type. Three jobs, three layers — don't overload one index.
  • Determinism where there is a right answer. Navigation inside a document is a table the model follows; reserve vectors for discovery, not traversal.
  • Budget tokens explicitly. Master files are cheap to keep present; chapters are pay-per-use.
  • Provenance and supersession are fields, not vibes.
  • Separate application from governance. Operational knowledge for agents; explanatory, traceable knowledge for humans.
  • Restructure, don't chunk the documents that matter.
  • Humans in the loop at the seams — ingestion decisions and writes to the shared store sit behind explicit authorization.

Prior art & credits

This is a field report on assembling and operating existing ideas in production, not a claim to have invented the primitives. (Established standards and vocabulary — agent skills, MCP, RAG, hybrid search, embeddings — are used as-is and assumed familiar.) Two patterns this report builds on but did not originate:

  • Book-to-Skill — turning a long document into a progressive-disclosure SKILL.md-style bundle — is an existing community pattern, not original to this report. The name and the conversion approach come from the book-to-skill project by @virgiliojr94 (which builds on the Agent Skills standard). Doc 02 documents how it's applied inside a multi-layer KB.
  • The "LLM wiki" pattern — curated, linked, maintainable markdown over repeated document search — was popularized by Andrej Karpathy and standardized by Google Cloud's Open Knowledge Format (2026). The governance layer here (doc 03) builds on top of that pattern.

What this report actually contributes is the integration and the operational lessons — not the building blocks: routing by question type across three cooperating layers; a reliability/governance model (claims with status/confidence/evidence, deterministic dashboards, source↔synthesis separation); and a tacit-expertise / adherence-loop framing — all drawn from running a multi-agent KB in a regulated domain.

What this is not

  • Not a library or SDK. There is no code to import.
  • Not a maintained product. Treat issues as discussion, not a support channel.
  • Not a data release. Nothing here contains client, case, or domain-specific content — only architecture and method.

Author & license

© 2026 Raffaele Cipro (Principal Product Manager, OpenText). Documentation is released under CC BY 4.0 — reuse freely with attribution. See DISCLAIMER.md for scope and confidentiality boundaries.

About

Turn hard-won human experience into knowledge your AI agents can act on — a methodology (routing + Book-to-Skill + governance) above OKF. A production field report.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages