PM Evaluation Framework

A working library of frameworks, templates, rubrics, and Claude skills for product managers. Built around the artifacts that actually go up the chain: strategy memos, PRDs, launch checklists, metrics dashboards, kill / continue decisions. Not for case-interview prep.

The material covers the full lifecycle, from framing a problem and running customer discovery through launch readiness and post-launch metrics. Two design choices distinguish it from a generic PM checklist library.

Discovery is treated as technique, not philosophy. The repo includes a Mom-Test customer-interview coach (with the rules behind why "would you use this?" produces fluff and "when's the last time that happened?" produces signal), a Lean-Startup-grounded value-hypothesis stress-tester, and substantive framework docs underneath each. The principles are tactical: what to ask, what to ignore, what counts as evidence.

Every primary skill chains into an adversarial second pass. The pm-red-team skill takes the output of any other skill (or any external AI critique, or your own draft) and re-reviews it under a different lens. Most artifacts that "look fine" do so because the first reviewer applied the comfortable lens. The second pass picks a different one. The point matches a working principle from my chief-of-staff setup: critically evaluate AI output, don't defer to it.

The repo organizes around one question: what does a good PM actually do at each stage of the work, and how do we know? The five evaluation criteria below answer it. Everything else (frameworks, decision-making docs, cross-functional traps, rubrics, templates, skills) feeds those five.

Personalize this for your own use

If you forked or cloned this repo, do these steps in order. Each one is concrete — you should be able to copy-paste and run.

Fork the repo and re-clone your fork. On GitHub, fork kalyvask/pm-evaluation-framework, then git clone git@github.com:<you>/pm-evaluation-framework.git. Edit LICENSE to your name and update the intro of this README to reflect your perspective.
Install the Claude skills so they work in every project, not just this repo. The twenty skills under .claude/skills/ auto-load when Claude Code is opened inside this repo. To make them available everywhere, copy them into your user-level skills directory:
```
mkdir -p ~/.claude/skills
cp -r .claude/skills/* ~/.claude/skills/
```
Restart Claude Code. Run /skills to confirm the eighteen pm-* skills are listed. If you only want a subset, copy individual skill folders.
Add your own artifact templates. Put new templates in templates/ alongside prd-template.md, decision-memo.md, decision-log.md, launch-criteria.md, and blameless-postmortem.md. Match the existing voice (imperative, section-headed, no jargon) so the skills can find and reuse them.
Customize rubric weights and criteria for your team's bar. The three rubrics in rubrics/ — pm-evaluation-rubric.md, strategy-memo-rubric.md, product-review-rubric.md — are deliberately editable. Re-weight criteria, add team-specific ones (e.g. "addresses regulated-data path"), or replace the five top-level questions in the The five evaluation criteria section below. The skills cite these files by path, so they pick up your edits automatically.
Extend or override skills with your own voice and style. To add a new skill: copy SKILL.md.tmpl into a new folder under .claude/skills/<your-skill>/SKILL.md and fill in the front-matter description (this is what Claude pattern-matches against). To change the voice of an existing skill: edit its SKILL.md directly — tighten the prose, add your own anti-patterns, or point it at frameworks you added in steps above.
Keep your private material local — it's already gitignored. .gitignore already excludes drafts/, local/, private/, outputs/, .cache/, *.draft.md, *.local.md, CVs, PDFs, DOCX, and .env*. Use these for unredacted notes, in-flight memo drafts, your CV, and any private artifact you want a skill to reason over without committing it. Sanity-check with git status before every commit.
Smoke-test that personalization worked. From inside Claude Code, ask: "Use the pm-evaluator skill to grade this draft memo: [paste]" — confirm the skill triggers and cites your edited rubric. Then ask: "Use pm-framework-selector for [your situation]" — confirm the skill routes you to the right framework.

How to use this repo

If you want to...	Start here
Pick the right framework for the decision in front of you	`frameworks/00-overview.md`
Sharpen how you frame a problem before jumping to solutions	`decision-making/problem-framing.md`
Think clearly about reversibility before committing	`decision-making/risk-and-reversibility.md`
Decide between user research methods (personas vs. JTBD vs. Kano)	`decision-making/research-methods.md`
Run a customer-discovery interview that actually produces signal	`decision-making/customer-interviews.md`
Stress-test the value hypothesis behind a product before committing	`decision-making/value-hypothesis.md`
Argue defensibility honestly — which moat, with what evidence	`decision-making/competitive-moat.md`
Navigate the user-vs-chooser split in enterprise sales	`decision-making/user-vs-chooser.md`
Adversarially re-review an existing critique before acting on it	`pm-red-team` skill
Prioritize a backlog	`decision-making/prioritization.md`
Define what to measure	`decision-making/metrics.md`
Identify the aha moment and design the path to it	`decision-making/activation.md`
Critique an activation or conversion funnel against the right model	`decision-making/conversion.md`
Review the user-facing design layer of a PRD or feature	`decision-making/behavioral-design.md`
Put AI into a product without bolting on a chatbot	`decision-making/ai-integration.md`
Work better with engineering	`cross-functional/engineering-partnership.md`
Run a launch where some failure is expected	`cross-functional/failure-management.md`
Understand how AI is changing the PM role	`ai-and-pm.md`
Evaluate a PM's reasoning on a case	`rubrics/pm-evaluation-rubric.md`
Evaluate a strategy memo before it goes up the chain	`rubrics/strategy-memo-rubric.md`
Show up well in a product / exec review	`rubrics/product-review-rubric.md`
Use a PRD / decision-memo / decision-log / postmortem template	`templates/`
Turn the framework into a daily PM agent partner (morning briefs, meeting prep, weekly review)	Operate-stage setup

Claude skills

The repo ships with eighteen Claude Code skills under .claude/skills/. They use the frameworks in this repo as their substrate, and they're organized by where you are in the product lifecycle.

Thirteen of them are reactive — invoke when you need critique on a specific artifact or decision. The other seven are operate-stage skills that turn the framework into a daily PM partner: morning brief, meeting prep, meeting debrief, weekly review, inbox triage, stakeholder tracker, and the context loader they all chain to. The operate-stage skills read from a pm-state/ folder you maintain (see below).

Stage	Skill	Use when
Frame	`pm-framework-selector`	You have a decision in front of you and don't know which framework to reach for
Frame / decide	`pm-decision-coach`	You want to be walked through a decision: framing → research → prioritization → risk
Frame / decide	`pm-design-process-router`	You're scoping a feature and need to decide whether it runs through the full Research → PRD → Concept → Detailed → Code pipeline or the Express Lane (verbal sign-off, no Figma). Routes by scope (XS/S/M/L/XL) with moderators for novelty, reversibility, and visibility
Discover	`pm-customer-interview-coach`	You're prepping or debriefing customer interviews and want them stress-tested against the Mom Test rules
Discover	`pm-value-hypothesis-tester`	You're about to launch, fundraise, or scale, and you want the underlying what / who / how bet pressure-tested
Build	`pm-prd-drafter`	You're starting a PRD, have a draft to critique, or are stuck on problem / success / scope
Build / Review	`pm-design-critic`	You have a PRD or design spec drafted and want the user-facing surface critiqued against behavioral and UX principles (defaults, friction, choice architecture, AI surfaces)
Build / Review	`pm-persona-stress-tester`	You have a design or flow and want it walked through step-by-step from a specific persona's point of view — what they see, think, expect, and where they bounce. 10-minute usability audit before scheduling real user research
Launch	`pm-launch-reviewer`	You have a launch coming up and want a gate-by-gate pre-flight against a real readiness bar
Measure	`pm-north-star-selector`	You're picking one North Star and weighing simple behavioral (MAU/DAU) vs. value-delivered vs. revenue. Forces the explainability, lead/lag, gameability comparison most teams skip
Measure	`pm-metrics-critic`	You're locking success criteria, debating a North Star, or staring at a dashboard that "looks fine"
Measure / Grow	`pm-funnel-critic`	You have an activation funnel, conversion funnel, paywall, or trial design and want the funnel-layer logic pressure-tested (binding stage, drop-off driver, model fit)
Review	`pm-progress-auditor`	You're about to send a status update, exec review, board email, or dashboard callout and want every claim pressure-tested for credibility leaks (overstated verbs, cherry-picked windows, vanity-metric-as-validation, "on track" with no threshold)
Review	`pm-evaluator`	You want a strategy memo, PRD, or analysis graded against the five-criterion rubric before it goes up
Challenge	`pm-red-team`	You have an existing critique (from another skill, an external AI, or yourself) and want it adversarially re-reviewed before you act on it
Operate	`pm-context-loader`	You're starting a PM workflow and need the agent to load your `pm-state/` (personal style, active projects, stakeholders) before doing anything else. Most operate-stage skills chain to this first.
Operate	`pm-morning-brief`	First thing in the morning. Pulls calendar, inbox, project todos, and yesterday's transcripts; surfaces the top 3 today plus slipped commitments. Writes to `pm-state/inbox/`
Operate	`pm-meeting-prep`	Before a meeting that matters. Pulls attendees, prior transcripts, project state, stakeholder context; drafts a one-page brief including the question you don't want asked
Operate	`pm-meeting-debrief`	Right after a meeting. Extracts commitments and decisions from a Granola transcript, drafts follow-ups, proposes writes to project todos and decisions files
Operate	`pm-inbox-triage`	When the inbox is backed up. Classifies threads, drafts replies for the substantive ones, surfaces stale threads where you're awaiting or being awaited
Operate	`pm-weekly-review`	Friday afternoon. What got done, what slipped, what's hanging, what to renegotiate. Catches the patterns the daily brief misses
Operate	`pm-stakeholder-tracker`	Weekly or before any high-stakes push. Who's going cold, who has open commitments hanging, who needs proactive attention

To use them locally, drop the repo in a directory Claude Code can see — the skills will appear automatically. New skills follow the format in SKILL.md.tmpl.

Operate-stage setup (the agent partner layer)

The seven operate-stage skills (morning brief, meeting prep, meeting debrief, inbox triage, weekly review, stakeholder tracker, context loader) read from a pm-state/ folder you maintain. This is what turns the reactive skill set into a daily PM agent partner.

To set up pm-state/:

Pick a location. Default: ~/pm-state/. If you want it synced (recommended), use a cloud-synced path like ~/OneDrive/Documents/pm-state/ and point the skills at that path. The operate-stage skills currently reference C:/Users/alexa/OneDrive/Documents/GSB/pm-state/; fork the repo and update the path for your install.
Scaffold the structure outside this repo. The folder layout is:
- you.md — personal PM context (style, role, active projects, cross-project stakeholders). The agent reads this at the start of every PM session.
- INSTRUCTIONS.md — overview of the folder for the agent and any human reader
- stakeholders.md — global cross-project stakeholder list
- decisions-log.md — cross-project decisions worth remembering
- lessons.md — cross-project lessons learned
- inbox/ — where the agent writes daily and weekly briefs (e.g. 2026-05-18-morning-brief.md)
- projects/_template/ — copy this when starting a new project
- projects/<name>/ — one folder per active project, containing status.md, decisions.md, open-questions.md, stakeholders.md, todos.md
Fill in you.md. This is the single most important file — the agent reads it at the start of every PM session.
Populate one or two active projects. Don't try to capture every project on day one; start with the two that matter most this week.
(Optional) Schedule the morning brief and weekly review as recurring jobs so they run automatically (e.g., 7:30am weekdays for the morning brief, 4pm Fridays for the weekly review).

The maintenance contract: 5 minutes a day updating todos.md and decisions.md for the active project, 15 minutes a week refreshing you.md and stakeholder lists. If you stop maintaining the state, the agent's outputs go stale and you stop trusting them. The maintenance is what makes the partnership work.

The pm-state/ folder itself is not checked into this repo. It contains personal context, stakeholder names, and decision history that should stay private. Each user maintains their own outside the repo.

Relationship with chief-of-staff

The seven operate-stage skills run standalone in any Claude Code session that can see a pm-state/ folder. If you want a full runtime around them, with Gmail and Google Calendar integration, a JSONL work queue with provenance, a permission engine with tier checks, scheduled overnight /email-triage and /calendar-prep jobs, conformance audits encoded as code, and a typed-link graph over stakeholders and decisions, see chief-of-staff. It is the runtime layer that wires this skill library (and four specialist subagents of its own) into a daily PM loop.

The split: pm-evaluation-framework is the public library you can drop into any repo. chief-of-staff is the personal runtime you install once.

The five evaluation criteria

Every framework in this repo ladders up to five questions that travel with you across any product decision:

Did you identify the right problem before jumping to solutions?
Did you apply a framework to remove bias and surface stakeholder perspectives?
Did you account for bundle effects and the full user journey — including procurement, admin, and downstream teams?
Did you define what success and failure look like upfront?
Did you prototype or validate before committing real resources?

The full rubric is in rubrics/pm-evaluation-rubric.md.

License

MIT. Use it, fork it, ship better products.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PM Evaluation Framework

Personalize this for your own use

How to use this repo

Claude skills

Operate-stage setup (the agent partner layer)

Relationship with chief-of-staff

The five evaluation criteria

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.claude/skills		.claude/skills
cross-functional		cross-functional
decision-making		decision-making
frameworks		frameworks
rubrics		rubrics
templates		templates
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SKILL.md.tmpl		SKILL.md.tmpl
ai-and-pm.md		ai-and-pm.md
what-makes-a-good-pm.md		what-makes-a-good-pm.md

Folders and files

Latest commit

History

Repository files navigation

PM Evaluation Framework

Personalize this for your own use

How to use this repo

Claude skills

Operate-stage setup (the agent partner layer)

Relationship with chief-of-staff

The five evaluation criteria

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages