Skip to content

feat: Mutation caching and transitive dependency tracking#509

Open
nicklafleur wants to merge 3 commits intoboxed:mainfrom
lyft:nicklafleur/function_hashing
Open

feat: Mutation caching and transitive dependency tracking#509
nicklafleur wants to merge 3 commits intoboxed:mainfrom
lyft:nicklafleur/function_hashing

Conversation

@nicklafleur
Copy link
Copy Markdown
Collaborator

@nicklafleur nicklafleur commented Apr 26, 2026

Summary

Adds incremental mutation testing to mutmut by skipping mutants in unchanged code, with transitive invalidation via a runtime call graph. On re-runs, only mutants in functions whose source (or whose dependencies' source) changed are re-tested.

High-level

  • Incremental mutation testing which cuts down mutation run duration ~linearly relative to the ratio of code changed (less code is changed, faster the run goes).
    • In practice in large codebases this means a >95% reduction in runtime on average as the amount of code not changed far outweighs the amount of code changed
    • Utility functions are particularly susceptible to "cache busting", even a noop syntactic change that modifies the AST will cause invalidation of all call chains which rely on those functions (technically correct since the code did change, but something to be aware of)
  • UI support will come in a future PR

Commit Breakdown:

  1. feat: add function hashing for incremental mutation testing
  • Foundation: hash each function's source (SHA-256, 12 chars) and persist via hash_by_function_name on SourceFileMutationData. On subsequent runs, compare old vs. new hashes and reset mutant results to None for changed functions only.
  • Introduces MutationMetadata (line number, mutation type, human-readable description) carried on every Mutation and serialized to JSON, plus an OPERATOR_TO_TYPE mapping and helpers (_determine_mutation_type, _describe_mutation).
  • Tightens naming conventions (private helpers prefixed with _, MUTATION_OPERATORS constant) and adds explicit type annotations.
  • New e2e_projects/benchmark_1k/ project (1000 mutants across a broad range of mutation types) with configurable delays via BENCHMARK_IMPORT_DELAY, BENCHMARK_CONFTEST_DELAY, and BENCHMARK_TEST_DELAY.
  1. refactor: relocate formatting utils
  • Consolidates formatting helpers previously scattered across __main__.py, file_mutation.py, and trampoline_templates.py into src/mutmut/utils/format_utils.py.
  • Pure code move with no behavior change; tests updated to import from the new location.
  1. feat: Add dependency tracking with function hash persistence
    • Builds the transitive invalidation layer on top of (1):
    • New MutmutState dataclass + state() singleton (state.py) consolidating old_function_hashes, current_function_hashes, and function_dependencies instead of leaking module-level globals.
    • New core.py with MutmutCallStack (backed by ContextVar for async/thread safety) and a relocated record_trampoline_hit that now records caller→callee edges during stats collection.
    • Trampoline updated to track call depth and emit dependency edges.
    • load_stats/save_stats extended to persist function_hashes and dependencies in mutmut-stats.json.
    • _cleanup_stale_stats and _invalidate_stale_dependency_edges prune state for functions that no longer exist or whose hashes changed, and transitively invalidate callers of changed callees.
    • New config options: track_dependencies and dependency_tracking_depth.
    • README/docs updated to describe the feature.

Known Issues

  • Because we only track dependencies at runtime through the trampoline logic, un-mutated function are omitted in the dependency graph that is built. The call graph represents the call graph of mutated functions not the global one.
  • We end up looping on all walkable files a few times, pushing time complexity higher than before. This is still a smaller penalty than the caching gain but definitely something that can be improves
  • The "cache" is in the form of a json file right now, which is horrifically inefficient for the sparse reads/writes which is typical in this workfow, moving to an sqlite-based store of the state could unlock some significant storage and parallelism breakthroughs
    • I have a follow-up PR that will branch out into different forking strategies that could be extended to include easy hookups for this kind of reporting strategy.

This commit implements function-level hashing to skip re-testing unchanged
mutants, along with fixes for mypy type errors and architectural improvements.

A follow-up commit will implement transitive invalidation of mutants based on
function call graphs and the new hashing mechanism.

INCREMENTAL MUTATION TESTING
- Add _compute_function_hashes() in file_mutation.py to generate SHA-256 hashes
  (truncated to 12 chars) for each mutated function's source code
- Store hash_by_function_name in SourceFileMutationData for persistence
- On subsequent runs, compare old vs new hashes to identify changed functions
- Reset mutant results to None (needs re-testing) when function hash changes
- Return changed_functions and current_hashes from create_mutants_for_file()

MUTATION METADATA TRACKING
- Add MutationMetadata dataclass with line_number, mutation_type, and description
- Each Mutation now carries metadata about what changed and where
- Add OPERATOR_TO_TYPE mapping to categorize mutations (number, string, boolean, etc.)
- Add _determine_mutation_type() to disambiguate operator categories
- Add _describe_mutation() for human-readable mutation descriptions
- Serialize/deserialize metadata to JSON via to_dict()/from_dict()

NAMING AND CONVENTIONS
- Rename public functions to private (_create_mutations, _combine_mutations_to_source, etc.)
- Rename mutation_operators to MUTATION_OPERATORS (constant naming convention)
- Add explicit type annotations throughout (dict[str, MutationMetadata], etc.)

NEW BENCHMARK PROJECT
- Add e2e_projects/benchmark_1k/ with ~1000 mutants for testing
- Includes modules: numbers, strings, booleans, operators, comparisons,
  arguments, returns, complex (recursion, higher-order functions)
- Configurable delays via BENCHMARK_IMPORT_DELAY, BENCHMARK_CONFTEST_DELAY,
  BENCHMARK_TEST_DELAY environment variables
Introduce MutmutState class to more easily manage runtime state for dependency
tracking (old_function_hashes, current_function_hashes, function_dependencies).
Persist hashes and dependencies to mutmut-stats.json for incremental runs.

Changes:
- Add state.py with MutmutState dataclass and state() singleton accessor
- Add core.py with MutmutCallStack (ContextVar-based) for async-safe tracking
- Move record_trampoline_hit to core.py, now tracks caller->callee edges
- Update trampoline to track call depth and record dependencies during stats
- Extend load_stats/save_stats to persist function_hashes and dependencies
- Add _cleanup_stale_stats and _invalidate_stale_dependency_edges functions
- Add track_dependencies and dependency_tracking_depth config options
- Update documentation describing the dependency tracking feature
@nicklafleur nicklafleur changed the title Nicklafleur/function hashing feat: Mutation caching and transitive dependency tracking Apr 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant