find_duplicated_keys misses escape-spelled duplicate keys (does not decode `.strings` escapes)

## TL;DR

`StringsFileValidationHelper.find_duplicated_keys` compares the **literal source bytes** of a quoted key, but `plutil` (which `L10nHelper.read_strings_file_as_hash` uses to build the comparison hash) **decodes `.strings` escape sequences first**. Two entries whose keys are spelled differently but decode to the same bytes — e.g. `"\U0041"` and `"A"` — are *one* duplicate to `plutil` (which silently keeps only the last value) and *two distinct* keys to the scanner. The scanner neither reports the duplicate nor raises, so a placeholder change hidden by that duplicate is silently missed.

## Root cause

`helper/ios/ios_strings_file_validation_helper.rb` buffers a quoted key's bytes verbatim (it writes the backslash and the following character as-is) and never decodes `.strings` escapes, while `plutil` normalizes them. So the scanner's key string differs from `plutil`'s canonical key whenever the spelling isn't byte-identical.

## Confirmed repro

```
"\U0041" = "first";
"A"      = "second";
```

`plutil -convert json` → `{"A":"second"}` (collapsed, last value wins). `find_duplicated_keys` → `{}` (no duplicate detected, no raise). A placeholder change between `first` and `second` would be invisible.

## Scope & realism

- **Pre-existing.** The escape handling predates #738 (`git blame` → `3cda14f1` / `821594e0`). The new `ios_lint_localization_placeholder_changes` action inherits the gap by newly depending on the scanner; the same gap already exists in `ios_lint_localizations`' `check_duplicate_keys` path.
- **≈ nil in practice.** It needs the *same* logical key present *twice in one base file*, spelled with *inconsistent* escapes (`"\U0041"` and `"A"`). Real `.strings` keys are identifiers or English source text; an actual duplicate would be spelled identically both times and *is* caught. Nobody hand-writes `\U0041` for `A`. Filing this for completeness, not urgency.
- **Survives the fail-closed change.** PR #738's "fail closed when a `.strings` file can't be scanned" only made scanner *raises* safe; here the scanner returns `{}` without raising, so it's a silent miss the fail-closed path doesn't cover.

## Decode spec (what a faithful canonicalizer must reproduce)

Derived empirically against `plutil` (macOS 26.x). The surprises are what make full decoding non-trivial:

- **Octal `\NNN`** — greedy, **max 3 digits**; overflow wraps `& 0xFF` (`\501`→`A`, `\400`→NUL); high bytes 0x80–0xFD decode via the **NeXTSTEP** string encoding, **not Latin-1** (decisive proof: `\271`→`„` U+201E, where Latin-1 would be `¹`); **0xFE/0xFF are undefined → NUL** (`\377`→`\0`). Requires shipping a 256-entry NeXTSTEP→codepoint table — no Ruby built-in encoding reproduces it.
- **Unicode `\U`** — uppercase only; **variable 0–4 hex digits** (not fixed 4); 0 digits → NUL; BMP only; **surrogate pairs combine** (`\UD83D\UDE00`→😀); a lone/broken surrogate **rejects the whole file**. (`\U0001F600` is *not* 😀 — it's `\x01` + literal `F600`.)
- **Strip rule** — `\` + anything not recognized → drop the backslash, keep the char. So **`\u` lowercase is literal `u`**, **`\x41` is literal `x41`** (not C hex), **`\e` is not ESC**.
- **Recognized single-char escapes** — exactly `\a \b \f \n \r \t \v \" \\` (plus `\U`, octal). C's set minus `\e`, `\?`, `\x`.
- **Backslash runs** — `\\` → one `\`; an **odd run immediately before the closing quote rejects the file**.
- **No Unicode normalization** — `plutil` keeps NFC and NFD distinct, so compare raw decoded bytes. Adding `unicode_normalize` would *manufacture* false-positive duplicates.

### Differential examples (the scanner currently misses every `≡` pair)

| Spelling | `plutil` canonical key | Note |
|---|---|---|
| `"\U0041"` ≡ `"\101"` ≡ `"A"` | `A` | unicode ≡ octal ≡ literal |
| `"\U9"` | TAB (`0x09`) | `\U` variable width, 1 digit |
| `"\UD83D\UDE00"` ≡ `"😀"` | 😀 | surrogate pair combines |
| `"A"` | literal `u0041` | lowercase `\u` is not unicode |
| `"\x41"` ≡ `"x41"` (≠ `"A"`) | `x41` | `\x` is not hex |
| `"\e"` ≡ `"e"` (≠ ESC) | `e` | `\e` not recognized |
| `"\200"` | U+00A0 | NeXTSTEP, not Latin-1 |
| `"\271"` | U+201E `„` | decisive NeXTSTEP proof |
| `"\377"` ≡ `"\400"` ≡ `"\0"` | NUL | 0xFE/0xFF undefined; `& 0xFF` overflow |
| `"a\tb"` ≡ literal-TAB key | `a⇥b` | escaped ≡ literal control char |
| `"\z"` ≡ `"z"`, `"\."` ≡ `"."`, `"\$"` ≡ `"$"` | letter/punct only | strip rule |
| NFC `é` vs NFD `é` | 2 distinct keys | no normalization |

## Already safe (no change needed)

- **UTF-16 / BOM** duplicate detection works (`read_utf8_lines` uses `rb:BOM|UTF-8`).
- **No Unicode normalization** needed — the byte-comparing scanner already matches `plutil`.
- **Structural syntax is a strict fail-closed subset** — across the full ASCII-punctuation sweep, quoted/unquoted collisions, comment hazards and multi-semicolon cases there is zero silent divergence and zero false duplicate; the only gap (`$` `/` `:` in unquoted keys) **RAISES** (safe). The few fail-*open* edges (trailing backslash before EOF, value-side escaped quote, empty/BOM-only file) are all mooted because `read_strings` (plutil) now runs *before* the duplicate check and rejects those files first.

## Proposed fix

**Option A — count-mismatch backstop (recommended, right-sized).** The action already has `plutil`'s distinct-key count (`new_strings.size`). Have the scanner also return its **total parsed-entry count**; if `scanner_entries > plutil_distinct_keys`, `plutil` collapsed a duplicate the scanner's literal compare missed → fail closed, reusing the existing abort path. Across the full corpus this flags **every** missed-dup case with **zero** false positives (distinct-but-similar keys, NFC/NFD, and `\U0001F600` vs 😀 all keep equal counts). No escape decoding, no NeXTSTEP table, no surrogate logic. Can live in the action without changing the scanner's raise behavior, so `ios_lint_localizations` is unaffected.

**Option B — full canonicalization.** Decode keys to match `plutil` byte-for-byte (the spec above) inside `find_duplicated_keys`, fixing both callers. Correct, but heavy (NeXTSTEP table, surrogate handling, backslash-run parity, reject conditions) — a faithful subset of CoreFoundation's old-style plist string decoder. Justified mainly if we *also* want cross-platform parsing off the macOS-only `plutil` shell-out, and it should be gated on a differential-parity test against `plutil`.

## Acceptance criteria

- The repro above is no longer a silent miss (detected as a duplicate, or fails closed).
- A differential test corpus against `plutil` covering the spec rows (unicode / octal / named / strip / backslash-run / literal-vs-escaped), asserting no false positives on distinct-but-similar keys and NFC/NFD.

## Related

- Follows #738 (`Add ios_lint_localization_placeholder_changes action`); original feature request #736.
- Pre-existing code: `find_duplicated_keys` in `lib/fastlane/plugin/wpmreleasetoolkit/helper/ios/ios_strings_file_validation_helper.rb`, also used by `ios_lint_localizations` (`check_duplicate_keys`).


Spelling	`plutil` canonical key	Note
`"\U0041"` ≡ `"\101"` ≡ `"A"`	`A`	unicode ≡ octal ≡ literal
`"\U9"`	TAB (`0x09`)	`\U` variable width, 1 digit
`"\UD83D\UDE00"` ≡ `"😀"`	😀	surrogate pair combines
`"A"`	literal `u0041`	lowercase `\u` is not unicode
`"\x41"` ≡ `"x41"` (≠ `"A"`)	`x41`	`\x` is not hex
`"\e"` ≡ `"e"` (≠ ESC)	`e`	`\e` not recognized
`"\200"`	U+00A0	NeXTSTEP, not Latin-1
`"\271"`	U+201E `„`	decisive NeXTSTEP proof
`"\377"` ≡ `"\400"` ≡ `"\0"`	NUL	0xFE/0xFF undefined; `& 0xFF` overflow
`"a\tb"` ≡ literal-TAB key	`a⇥b`	escaped ≡ literal control char
`"\z"` ≡ `"z"`, `"\."` ≡ `"."`, `"\$"` ≡ `"$"`	letter/punct only	strip rule
NFC `é` vs NFD `é`	2 distinct keys	no normalization

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

find_duplicated_keys misses escape-spelled duplicate keys (does not decode `.strings` escapes) #740

TL;DR

Root cause

Confirmed repro

Scope & realism

Decode spec (what a faithful canonicalizer must reproduce)

Differential examples (the scanner currently misses every `≡` pair)

Already safe (no change needed)

Proposed fix

Acceptance criteria

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

find_duplicated_keys misses escape-spelled duplicate keys (does not decode .strings escapes) #740

Description

TL;DR

Root cause

Confirmed repro

Scope & realism

Decode spec (what a faithful canonicalizer must reproduce)

Differential examples (the scanner currently misses every ≡ pair)

Already safe (no change needed)

Proposed fix

Acceptance criteria

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

find_duplicated_keys misses escape-spelled duplicate keys (does not decode `.strings` escapes) #740

Differential examples (the scanner currently misses every `≡` pair)