Skip to content

fix: audit findings — reversible initials, instance tracking, live flags (v2.5.2–2.5.7)#33

Merged
click0 merged 6 commits into
mainfrom
claude/refactor-data-masking-lIcWN
Jun 11, 2026
Merged

fix: audit findings — reversible initials, instance tracking, live flags (v2.5.2–2.5.7)#33
click0 merged 6 commits into
mainfrom
claude/refactor-data-masking-lIcWN

Conversation

@click0

@click0 click0 commented Jun 11, 2026

Copy link
Copy Markdown
Owner

Fixes from project audit (one version per commit)

Version Change
2.5.2 Initials stored in mapping — mask→unmask roundtrip restores them; regexes no longer match across newlines; guard against re-masking already-masked surnames; +27 tests
2.5.3 Repeated text dates track instances [1,2,...]; engine date masking now deterministic; removed duplicated implementation
2.5.4 Live MASK_* flags through the wrapper (restores pre-refactoring write behavior); dead imports removed
2.5.5 modules/rank_data.py re-exports root copy (was 636-line duplicate)
2.5.6 README: package architecture, initials, threat model section
2.5.7 Generated passwords printed to stderr, not stdout

Test plan

  • 426 tests pass (397 existing + 29 new)

https://claude.ai/code/session_01XT6iUWaQgahXDB9TWX9Bq7


Generated by Claude Code

claude added 6 commits June 11, 2026 20:28
- New 'initials' mapping category; mask->unmask roundtrip now restores
  initials (was irreversible)
- Initials regexes no longer match across newlines
- Guard main PIB parser against re-masking already-masked surnames
- Mapping written in document order for correct instance tracking
- 27 new tests (tests/test_initials.py); version asserts made dynamic

https://claude.ai/code/session_01XT6iUWaQgahXDB9TWX9Bq7
- Repeated text dates now restore all occurrences on unmask (instances
  tracked via add_to_mapping instead of hardcoded [1])
- Engine's date_text path is deterministic now (duplicate copy never
  seeded the RNG)
- _mask_date_text is an alias of mask_date_text (removed ~50-line dup)

https://claude.ai/code/session_01XT6iUWaQgahXDB9TWX9Bq7
- data_masking.MASK_* / DEBUG_MODE / PRESERVE_CASE / HASH_ALGORITHM
  delegate reads (PEP 562 __getattr__) and writes (module class swap)
  to masking.constants — restores pre-refactoring behavior where
  setting data_masking.MASK_NAMES=False actually disabled masking
- Remove unused imports in masking/cli.py
- 2 regression tests

https://claude.ai/code/session_01XT6iUWaQgahXDB9TWX9Bq7
Root rank_data.py is the single source of truth; modules/rank_data.py
was an identical 636-line copy with no sync mechanism. Direction chosen
because importing the modules package pulls security/cryptography,
which must stay optional.

https://claude.ai/code/session_01XT6iUWaQgahXDB9TWX9Bq7
- README: masking/ and unmasking/ structure, initials feature,
  __main__.py invocation, version pointer to CHANGELOG
- New threat model section: pseudonymization limits stated explicitly
- Security recommendations: prefer --password-env over --password
- Fix python -m mention in wrapper docstrings

https://claude.ai/code/session_01XT6iUWaQgahXDB9TWX9Bq7
Generated encryption passwords no longer land in redirected stdout,
pipes or CI logs; shown once on stderr with an explicit warning.

https://claude.ai/code/session_01XT6iUWaQgahXDB9TWX9Bq7
@click0 click0 merged commit c8938e4 into main Jun 11, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants