Restructure: operational skill core, anchored diagnostics, theory split, eval harness#3
Open
JoshKirk800 wants to merge 1 commit into
Open
Restructure: operational skill core, anchored diagnostics, theory split, eval harness#3JoshKirk800 wants to merge 1 commit into
JoshKirk800 wants to merge 1 commit into
Conversation
…t, eval harness Splits luck.md into a lean operational skill file and a THEORY.md companion. Adds PASS/AT RISK/FAIL anchors per facet, a mechanical binding-constraint rule, a defined diagnostic output format, and a worked transcript. Adds examples/ with prospective outcome tracking and evals/ implementing the framework's own falsification test as a blinded pairwise eval. Rigor fixes in THEORY.md: Prediction 2 softened to its monotonic form, Weimar labeled a retrodiction, Assembly Theory citations corrected and its contested status acknowledged. Full change-to-rationale mapping in CHANGES.md. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this is
A restructuring of luck.md aimed at one goal: turning the framework from an essay a model reads into a procedure a model executes, without losing any of the theory. Full change-to-rationale mapping is in the new CHANGES.md; the short version:
Changes
luck.md becomes the operational core. The seven facets, decision table, and failure modes stay, with three additions that make diagnostics reproducible across runs:
The "For AI Systems" section is now scoped: full diagnostic on strategic/durability questions, silent design heuristics when producing artifacts, nothing on tactical queries.
THEORY.md takes the conceptual foundation. Core premise, extended worked examples, predictions, and grounding move here. Rigor edits along the way: Prediction 2's (N-1)/N functional form softened to its defensible monotonic claim, the Weimar reading labeled a retrodiction (with a general note distinguishing retrodiction from prediction), Assembly Theory citations corrected to Marshall et al. 2021 / Sharma et al. 2023 with its contested status acknowledged, and "luck is a fundamental force" reframed as an explicit definitional move so the framework's claims rest on the predictions rather than on metaphysics.
examples/ adds prospective evidence. Transcripts of the skill in use, each ending with a pending Outcome section to fill in when the outcome is known -- so the repo can accumulate diagnoses made before outcomes were known, not only history read backward.
evals/ implements the framework's own falsification test. The doc already said: if the framework does not produce measurably better outputs, it is by its own logic insolvent. The harness runs that test -- 12 strategic-decision prompts, paired generation with/without the skill loaded, blinded pairwise judging with randomized A/B assignment and an anti-length-bias rubric. README documents the known limitations (judge bias, same-family judge, n=12 is directional).
What is deliberately preserved
All seven facets and their ordering, the failure-mode taxonomy and names, all five worked examples, all six predictions, the theoretical grounding, and the closing geometry passage.
Happy to split this into smaller PRs (e.g. the eval harness separately) if that is easier to review, or to drop any piece that does not fit your direction for the project.
Generated with Claude Code and reviewed by a human before submission.