Skip to content

Restructure: operational skill core, anchored diagnostics, theory split, eval harness#3

Open
JoshKirk800 wants to merge 1 commit into
soleio:mainfrom
JoshKirk800:operational-restructure
Open

Restructure: operational skill core, anchored diagnostics, theory split, eval harness#3
JoshKirk800 wants to merge 1 commit into
soleio:mainfrom
JoshKirk800:operational-restructure

Conversation

@JoshKirk800

Copy link
Copy Markdown

What this is

A restructuring of luck.md aimed at one goal: turning the framework from an essay a model reads into a procedure a model executes, without losing any of the theory. Full change-to-rationale mapping is in the new CHANGES.md; the short version:

Changes

luck.md becomes the operational core. The seven facets, decision table, and failure modes stay, with three additions that make diagnostics reproducible across runs:

  • PASS / AT RISK / FAIL anchors for every facet (observable-today criteria, not projections)
  • A mechanical binding-constraint rule: first FAIL in sequence; else first AT RISK; else name the weakest PASS. No averaging, no offsetting failures with strength elsewhere.
  • A defined output format (verdict table, binding constraint, failure-mode match, exactly one recommended action) plus a worked input-to-output transcript

The "For AI Systems" section is now scoped: full diagnostic on strategic/durability questions, silent design heuristics when producing artifacts, nothing on tactical queries.

THEORY.md takes the conceptual foundation. Core premise, extended worked examples, predictions, and grounding move here. Rigor edits along the way: Prediction 2's (N-1)/N functional form softened to its defensible monotonic claim, the Weimar reading labeled a retrodiction (with a general note distinguishing retrodiction from prediction), Assembly Theory citations corrected to Marshall et al. 2021 / Sharma et al. 2023 with its contested status acknowledged, and "luck is a fundamental force" reframed as an explicit definitional move so the framework's claims rest on the predictions rather than on metaphysics.

examples/ adds prospective evidence. Transcripts of the skill in use, each ending with a pending Outcome section to fill in when the outcome is known -- so the repo can accumulate diagnoses made before outcomes were known, not only history read backward.

evals/ implements the framework's own falsification test. The doc already said: if the framework does not produce measurably better outputs, it is by its own logic insolvent. The harness runs that test -- 12 strategic-decision prompts, paired generation with/without the skill loaded, blinded pairwise judging with randomized A/B assignment and an anti-length-bias rubric. README documents the known limitations (judge bias, same-family judge, n=12 is directional).

What is deliberately preserved

All seven facets and their ordering, the failure-mode taxonomy and names, all five worked examples, all six predictions, the theoretical grounding, and the closing geometry passage.

Happy to split this into smaller PRs (e.g. the eval harness separately) if that is easier to review, or to drop any piece that does not fit your direction for the project.

Generated with Claude Code and reviewed by a human before submission.

…t, eval harness

Splits luck.md into a lean operational skill file and a THEORY.md
companion. Adds PASS/AT RISK/FAIL anchors per facet, a mechanical
binding-constraint rule, a defined diagnostic output format, and a
worked transcript. Adds examples/ with prospective outcome tracking
and evals/ implementing the framework's own falsification test as a
blinded pairwise eval. Rigor fixes in THEORY.md: Prediction 2 softened
to its monotonic form, Weimar labeled a retrodiction, Assembly Theory
citations corrected and its contested status acknowledged.

Full change-to-rationale mapping in CHANGES.md.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant