Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,28 @@ All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).

## [2.6.0] - 2026-06

Release rollup of 2.5.2–2.5.8 (reversible initials, instance tracking fixes,
live `MASK_*` flags, O(n) replacements, threat model docs, stderr passwords).

### Changed
- Version strings synchronized across all file headers (were stuck at 2.5.1)
- Historical "extracted during vX refactoring" phrases pinned to v2.5.0
so they no longer drift with version bumps

## [2.5.8] - 2026-06

### Performance
- Replacement loops (mask engine, initials phase, both unmask passes) build
the result via segment join instead of rebuilding the whole string per
replacement — O(n) instead of O(n²) on large documents

### Fixed
- Mask engine processes items in document order: instance numbers now match
occurrence order (was reverse — wrong original could be restored when two
different values masked to the same string)

## [2.5.7] - 2026-06

### Security
Expand Down
8 changes: 4 additions & 4 deletions config_example.py
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Configuration example for data_masking.py v2.5.1
Configuration example for data_masking.py v2.6.0

Demonstrates all available configuration options using dataclasses.
No external dependencies required — uses only Python standard library.

Author: Vladyslav V. Prodan
Contact: github.com/click0
Phone: +38(099)6053340
Version: 2.5.1
Version: 2.6.0
License: BSD 3-Clause "New" or "Revised" License
Year: 2025-2026
"""
Expand All @@ -25,7 +25,7 @@
@dataclass
class SystemConfig:
"""Системні налаштування."""
version: str = "v2.5.1"
version: str = "v2.6.0"
hash_algorithm: str = "blake2b"
hash_digest_size: int = 8
encoding: str = "utf-8"
Expand Down Expand Up @@ -172,7 +172,7 @@ def to_dict(self) -> dict:
data = config.to_dict()

print("=" * 70)
print(" Data Masking Configuration Example v2.5.1")
print(" Data Masking Configuration Example v2.6.0")
print("=" * 70)

for section_name, section_data in data.items():
Expand Down
6 changes: 3 additions & 3 deletions config_example.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# ==========================================================================
# Приклад конфігурації для системи маскування даних v2.5.1
# Приклад конфігурації для системи маскування даних v2.6.0
#
# Скопіюйте цей файл як config.yaml та налаштуйте під свої потреби.
# Згенерувати конфігурацію за замовчуванням:
Expand All @@ -8,7 +8,7 @@
# Author: Vladyslav V. Prodan
# Contact: github.com/click0
# Phone: +38(099)6053340
# Version: 2.5.1
# Version: 2.6.0
# License: BSD 3-Clause "New" or "Revised" License
# Year: 2025-2026
# ==========================================================================
Expand All @@ -18,7 +18,7 @@
# --------------------------------------------------------------------------
system:
# Версія конфігурації (для сумісності)
version: "v2.5.1"
version: "v2.6.0"

# Алгоритм хешування для детерміністичного маскування.
# Той самий вхід завжди дає той самий замаскований результат.
Expand Down
23 changes: 15 additions & 8 deletions data_masking.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,19 +2,26 @@
# -*- coding: utf-8 -*-

"""
Data Masking Script v2.5.1
Data Masking Script v2.6.0
Локально узгоджене маскування конфіденційних даних з INSTANCE TRACKING

ОНОВЛЕНО В v2.5.1:
- Рефакторинг: розбито на пакет masking/ (constants, helpers, language,
context, mask_personal, mask_military, engine, cli)
- Додано __main__.py: запуск з кореня репо — python . mask / python . unmask
- Зворотна сумісність: всі імпорти з data_masking продовжують працювати
ОНОВЛЕНО В v2.6.0:
- Зворотні ініціали: ПІБ типу "Іванов П.А." зберігаються у mapping
(категорія initials) та повністю відновлюються при unmask
- Instance tracking у порядку документа; повторні текстові дати
відновлюються всі
- "Живі" прапорці MASK_*: data_masking.MASK_NAMES = False знову діє
- O(n) заміни замість O(n^2) на великих файлах
- Згенеровані паролі виводяться у stderr

Архітектура (з v2.5.0): тонка обгортка над пакетом masking/
(constants, helpers, language, context, mask_personal, mask_military,
engine, cli); запуск з кореня репо — python . mask / python . unmask

Author: Vladyslav V. Prodan
Contact: github.com/click0
Phone: +38(099)6053340
Version: 2.5.1
Version: 2.6.0
License: BSD 3-Clause "New" or "Revised" License
Year: 2025-2026
"""
Expand All @@ -23,7 +30,7 @@
# Re-exports from masking package for backward compatibility
# ============================================================================

__version__ = "2.5.7"
__version__ = "2.6.0"

from masking.constants import (
__version__, __author__, __contact__, __phone__, __license__, __year__,
Expand Down
2 changes: 1 addition & 1 deletion diagnose_mapping.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
Author: Vladyslav V. Prodan
Contact: github.com/click0
Phone: +38(099)6053340
Version: 2.5.1
Version: 2.6.0
License: BSD 3-Clause "New" or "Revised" License
Year: 2025-2026

Expand Down
2 changes: 1 addition & 1 deletion masking/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
"""
Masking package — data masking with instance tracking.

Refactored from monolithic data_masking.py in v2.5.1.
Refactored from monolithic data_masking.py (v2.5.0).
"""

from masking.constants import __version__, __author__, __contact__, __license__, __year__
Expand Down
2 changes: 1 addition & 1 deletion masking/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
"""
CLI entry point and orchestration for data masking.

Extracted from data_masking.py during v2.5.1 refactoring.
Extracted from data_masking.py during the package refactoring (v2.5.0).
"""

import json
Expand Down
4 changes: 2 additions & 2 deletions masking/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
"""
Masking constants, patterns, and configuration flags.

Extracted from data_masking.py during v2.5.1 refactoring.
Extracted from data_masking.py during the package refactoring (v2.5.0).
"""

import re
Expand All @@ -26,7 +26,7 @@
# ============================================================================
# МЕТАДАНІ
# ============================================================================
__version__ = "2.5.7"
__version__ = "2.6.0"
__author__ = "Vladyslav V. Prodan"
__contact__ = "github.com/click0"
__phone__ = "+38(099)6053340"
Expand Down
2 changes: 1 addition & 1 deletion masking/context.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
"""
Context analysis and line parsing functions.

Extracted from data_masking.py during v2.5.1 refactoring.
Extracted from data_masking.py during the package refactoring (v2.5.0).
"""

import re
Expand Down
69 changes: 38 additions & 31 deletions masking/engine.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
"""
Main masking engine: context-aware text masking and JSON processing.

Extracted from data_masking.py during v2.5.1 refactoring.
Extracted from data_masking.py during the package refactoring (v2.5.0).
"""

import random
Expand Down Expand Up @@ -133,10 +133,12 @@ def _mask_initials_pib(text: str, masking_dict: Dict, instance_counters: Dict) -
if not any(c[0] < k[1] and c[1] > k[0] for k in kept):
kept.append(c)

# Фаза 2: у порядку документа маскуємо та записуємо mapping
# Фаза 2: у порядку документа маскуємо, записуємо mapping
# і збираємо результат сегментами (O(n))
masking_dict["mappings"].setdefault("initials", {})
kept.sort(key=lambda x: x[0])
replacements = []
segments = []
prev_end = 0
for start, end, surname, initials, has_space, ini_first in kept:
ms = mask_surname(surname, masking_dict, instance_counters)
sep = '. ' if has_space else '.'
Expand All @@ -147,11 +149,13 @@ def _mask_initials_pib(text: str, masking_dict: Dict, instance_counters: Dict) -
masked_ini = add_to_mapping(masking_dict, instance_counters,
"initials", orig_ini, masked_ini)
new_text = f"{masked_ini} {ms}" if ini_first else f"{ms} {masked_ini}"
replacements.append((start, end, new_text))
segments.append(text[prev_end:start])
segments.append(new_text)
prev_end = end

# Заміни з кінця тексту, щоб не збити позиції
for start, end, new_text in reversed(replacements):
text = text[:start] + new_text + text[end:]
if segments:
segments.append(text[prev_end:])
text = ''.join(segments)

return text

Expand Down Expand Up @@ -265,43 +269,46 @@ def mask_text_context_aware(text: str, masking_dict: Dict, instance_counters: Di
if not skip:
items_to_mask.append({'type': 'date_text', 'full_text': match.group(0), 'number_part': match.group(0), 'start': match.start(), 'end': match.end()})

items_to_mask.sort(key=lambda x: x['start'], reverse=True)
# Обхід у порядку документа: instance tracking збігається з порядком
# входжень (потрібно для unmask), а заміни збираються сегментами —
# O(n) замість квадратичного text[:i] + ... + text[j:] на кожен елемент
items_to_mask.sort(key=lambda x: x['start'])

segments = []
prev_end = 0
for item in items_to_mask:
masked = ""
if item['start'] < prev_end: continue # перекриття — пропускаємо
if text[item['start']:item['end']] != item['full_text']: continue
if item['type'] == 'ipn': masked = mask_ipn(item['number_part'], masking_dict, instance_counters)
elif item['type'] == 'passport_id': masked = mask_passport_id(item['number_part'], masking_dict, instance_counters)
elif item['type'] == 'military_id': masked = mask_military_id(item['number_part'], masking_dict, instance_counters)
elif item['type'] == 'military_unit': masked = mask_military_unit(item['number_part'], masking_dict, instance_counters)

replacement = None
if item['type'] == 'ipn': replacement = mask_ipn(item['number_part'], masking_dict, instance_counters)
elif item['type'] == 'passport_id': replacement = mask_passport_id(item['number_part'], masking_dict, instance_counters)
elif item['type'] == 'military_id': replacement = mask_military_id(item['number_part'], masking_dict, instance_counters)
elif item['type'] == 'military_unit': replacement = mask_military_unit(item['number_part'], masking_dict, instance_counters)
elif item['type'] == 'brigade_number':
masked = mask_brigade_number(item['full_text'], masking_dict, instance_counters)
text = text[:item['start']] + masked + text[item['end']:]
continue
replacement = mask_brigade_number(item['full_text'], masking_dict, instance_counters)
elif item['type'] == 'date':
masked = mask_date(item['full_text'], masking_dict, instance_counters)
text = text[:item['start']] + masked + text[item['end']:]
continue
replacement = mask_date(item['full_text'], masking_dict, instance_counters)
elif item['type'] == 'date_text':
masked = _mask_date_text(item['full_text'], masking_dict, instance_counters)
text = text[:item['start']] + masked + text[item['end']:]
continue
replacement = _mask_date_text(item['full_text'], masking_dict, instance_counters)
elif item['type'] == 'order_simple':
masked = mask_order_number(item['number_part'], masking_dict, instance_counters)
new_full = item['full_text'].replace(item['number_part'], masked, 1)
text = text[:item['start']] + new_full + text[item['end']:]
continue
replacement = item['full_text'].replace(item['number_part'], masked, 1)
elif item['type'] == 'order_with_letters':
masked = mask_order_number_with_letters(item['number_part'], masking_dict, instance_counters)
new_full = item['full_text'].replace(item['number_part'], masked, 1)
text = text[:item['start']] + new_full + text[item['end']:]
continue
replacement = item['full_text'].replace(item['number_part'], masked, 1)
elif item['type'] in ['br_complex', 'br_with_slashes', 'br_with_suffix', 'br_standalone']:
masked = mask_br_number(item['full_text'], masking_dict, instance_counters)
text = text[:item['start']] + masked + text[item['end']:]
replacement = mask_br_number(item['full_text'], masking_dict, instance_counters)

if replacement is None or replacement == "":
continue
segments.append(text[prev_end:item['start']])
segments.append(replacement)
prev_end = item['end']

if masked: text = text[:item['start']] + masked + text[item['end']:]
if segments:
segments.append(text[prev_end:])
text = ''.join(segments)

lines = text.split('\n')
masked_lines = []
Expand Down
2 changes: 1 addition & 1 deletion masking/helpers.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
"""
Base helper functions for masking operations.

Extracted from data_masking.py during v2.5.1 refactoring.
Extracted from data_masking.py during the package refactoring (v2.5.0).
"""

import hashlib
Expand Down
2 changes: 1 addition & 1 deletion masking/language.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
"""
Language analysis functions: gender detection, grammatical case, declension.

Extracted from data_masking.py during v2.5.1 refactoring.
Extracted from data_masking.py during the package refactoring (v2.5.0).
"""

import random
Expand Down
2 changes: 1 addition & 1 deletion masking/mask_military.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
"""
Military data masking: ranks, units, orders, BR numbers, dates.

Extracted from data_masking.py during v2.5.1 refactoring.
Extracted from data_masking.py during the package refactoring (v2.5.0).
"""

import random
Expand Down
2 changes: 1 addition & 1 deletion masking/mask_personal.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
"""
Personal data masking functions: IPN, passport, military ID, names.

Extracted from data_masking.py during v2.5.1 refactoring.
Extracted from data_masking.py during the package refactoring (v2.5.0).
"""

import random
Expand Down
2 changes: 1 addition & 1 deletion modules/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
# -*- coding: utf-8 -*-

"""
Data Masking Modules Package v2.5.1
Data Masking Modules Package v2.6.0

Модулі системи маскування даних.
"""
Expand Down
4 changes: 2 additions & 2 deletions modules/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
# -*- coding: utf-8 -*-

"""
Configuration Module v2.5.1 for data_masking.py
Configuration Module v2.6.0 for data_masking.py

Provides YAML + ENV + CLI configuration loading with priority resolution:
CLI > ENV > config.yaml > config.py > Default
Expand All @@ -16,7 +16,7 @@
Year: 2025-2026
"""

__version__ = "2.5.1"
__version__ = "2.6.0"

import os
import logging
Expand Down
4 changes: 2 additions & 2 deletions modules/masking_logger.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
# -*- coding: utf-8 -*-

"""
Logging Module v2.5.1 for data_masking.py
Logging Module v2.6.0 for data_masking.py

Provides structured logging with JSON and colored console output
for masking operations.
Expand All @@ -22,7 +22,7 @@
from typing import Any, Dict, Optional


__version__ = "2.5.1"
__version__ = "2.6.0"


class JsonFormatter(logging.Formatter):
Expand Down
Loading
Loading