You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
StringsFileValidationHelper.find_duplicated_keys compares the literal source bytes of a quoted key, but plutil (which L10nHelper.read_strings_file_as_hash uses to build the comparison hash) decodes .strings escape sequences first. Two entries whose keys are spelled differently but decode to the same bytes — e.g. "\U0041" and "A" — are one duplicate to plutil (which silently keeps only the last value) and two distinct keys to the scanner. The scanner neither reports the duplicate nor raises, so a placeholder change hidden by that duplicate is silently missed.
Root cause
helper/ios/ios_strings_file_validation_helper.rb buffers a quoted key's bytes verbatim (it writes the backslash and the following character as-is) and never decodes .strings escapes, while plutil normalizes them. So the scanner's key string differs from plutil's canonical key whenever the spelling isn't byte-identical.
Confirmed repro
"\U0041" = "first";
"A" = "second";
plutil -convert json → {"A":"second"} (collapsed, last value wins). find_duplicated_keys → {} (no duplicate detected, no raise). A placeholder change between first and second would be invisible.
Scope & realism
Pre-existing. The escape handling predates Add ios_lint_localization_placeholder_changes action #738 (git blame → 3cda14f1 / 821594e0). The new ios_lint_localization_placeholder_changes action inherits the gap by newly depending on the scanner; the same gap already exists in ios_lint_localizations' check_duplicate_keys path.
≈ nil in practice. It needs the same logical key present twice in one base file, spelled with inconsistent escapes ("\U0041" and "A"). Real .strings keys are identifiers or English source text; an actual duplicate would be spelled identically both times and is caught. Nobody hand-writes \U0041 for A. Filing this for completeness, not urgency.
Survives the fail-closed change. PR Add ios_lint_localization_placeholder_changes action #738's "fail closed when a .strings file can't be scanned" only made scanner raises safe; here the scanner returns {} without raising, so it's a silent miss the fail-closed path doesn't cover.
Decode spec (what a faithful canonicalizer must reproduce)
Derived empirically against plutil (macOS 26.x). The surprises are what make full decoding non-trivial:
Octal \NNN — greedy, max 3 digits; overflow wraps & 0xFF (\501→A, \400→NUL); high bytes 0x80–0xFD decode via the NeXTSTEP string encoding, not Latin-1 (decisive proof: \271→„ U+201E, where Latin-1 would be ¹); 0xFE/0xFF are undefined → NUL (\377→\0). Requires shipping a 256-entry NeXTSTEP→codepoint table — no Ruby built-in encoding reproduces it.
Unicode \U — uppercase only; variable 0–4 hex digits (not fixed 4); 0 digits → NUL; BMP only; surrogate pairs combine (\UD83D\UDE00→😀); a lone/broken surrogate rejects the whole file. (\U0001F600 is not 😀 — it's \x01 + literal F600.)
Strip rule — \ + anything not recognized → drop the backslash, keep the char. So \u lowercase is literal u, \x41 is literal x41 (not C hex), \e is not ESC.
Backslash runs — \\ → one \; an odd run immediately before the closing quote rejects the file.
No Unicode normalization — plutil keeps NFC and NFD distinct, so compare raw decoded bytes. Adding unicode_normalize would manufacture false-positive duplicates.
Differential examples (the scanner currently misses every ≡ pair)
Spelling
plutil canonical key
Note
"\U0041" ≡ "\101" ≡ "A"
A
unicode ≡ octal ≡ literal
"\U9"
TAB (0x09)
\U variable width, 1 digit
"\UD83D\UDE00" ≡ "😀"
😀
surrogate pair combines
"A"
literal u0041
lowercase \u is not unicode
"\x41" ≡ "x41" (≠ "A")
x41
\x is not hex
"\e" ≡ "e" (≠ ESC)
e
\e not recognized
"\200"
U+00A0
NeXTSTEP, not Latin-1
"\271"
U+201E „
decisive NeXTSTEP proof
"\377" ≡ "\400" ≡ "\0"
NUL
0xFE/0xFF undefined; & 0xFF overflow
"a\tb" ≡ literal-TAB key
a⇥b
escaped ≡ literal control char
"\z" ≡ "z", "\." ≡ ".", "\$" ≡ "$"
letter/punct only
strip rule
NFC é vs NFD é
2 distinct keys
no normalization
Already safe (no change needed)
UTF-16 / BOM duplicate detection works (read_utf8_lines uses rb:BOM|UTF-8).
No Unicode normalization needed — the byte-comparing scanner already matches plutil.
Structural syntax is a strict fail-closed subset — across the full ASCII-punctuation sweep, quoted/unquoted collisions, comment hazards and multi-semicolon cases there is zero silent divergence and zero false duplicate; the only gap ($/: in unquoted keys) RAISES (safe). The few fail-open edges (trailing backslash before EOF, value-side escaped quote, empty/BOM-only file) are all mooted because read_strings (plutil) now runs before the duplicate check and rejects those files first.
Proposed fix
Option A — count-mismatch backstop (recommended, right-sized). The action already has plutil's distinct-key count (new_strings.size). Have the scanner also return its total parsed-entry count; if scanner_entries > plutil_distinct_keys, plutil collapsed a duplicate the scanner's literal compare missed → fail closed, reusing the existing abort path. Across the full corpus this flags every missed-dup case with zero false positives (distinct-but-similar keys, NFC/NFD, and \U0001F600 vs 😀 all keep equal counts). No escape decoding, no NeXTSTEP table, no surrogate logic. Can live in the action without changing the scanner's raise behavior, so ios_lint_localizations is unaffected.
Option B — full canonicalization. Decode keys to match plutil byte-for-byte (the spec above) inside find_duplicated_keys, fixing both callers. Correct, but heavy (NeXTSTEP table, surrogate handling, backslash-run parity, reject conditions) — a faithful subset of CoreFoundation's old-style plist string decoder. Justified mainly if we also want cross-platform parsing off the macOS-only plutil shell-out, and it should be gated on a differential-parity test against plutil.
Acceptance criteria
The repro above is no longer a silent miss (detected as a duplicate, or fails closed).
A differential test corpus against plutil covering the spec rows (unicode / octal / named / strip / backslash-run / literal-vs-escaped), asserting no false positives on distinct-but-similar keys and NFC/NFD.
Pre-existing code: find_duplicated_keys in lib/fastlane/plugin/wpmreleasetoolkit/helper/ios/ios_strings_file_validation_helper.rb, also used by ios_lint_localizations (check_duplicate_keys).
TL;DR
StringsFileValidationHelper.find_duplicated_keyscompares the literal source bytes of a quoted key, butplutil(whichL10nHelper.read_strings_file_as_hashuses to build the comparison hash) decodes.stringsescape sequences first. Two entries whose keys are spelled differently but decode to the same bytes — e.g."\U0041"and"A"— are one duplicate toplutil(which silently keeps only the last value) and two distinct keys to the scanner. The scanner neither reports the duplicate nor raises, so a placeholder change hidden by that duplicate is silently missed.Root cause
helper/ios/ios_strings_file_validation_helper.rbbuffers a quoted key's bytes verbatim (it writes the backslash and the following character as-is) and never decodes.stringsescapes, whileplutilnormalizes them. So the scanner's key string differs fromplutil's canonical key whenever the spelling isn't byte-identical.Confirmed repro
plutil -convert json→{"A":"second"}(collapsed, last value wins).find_duplicated_keys→{}(no duplicate detected, no raise). A placeholder change betweenfirstandsecondwould be invisible.Scope & realism
ios_lint_localization_placeholder_changesaction #738 (git blame→3cda14f1/821594e0). The newios_lint_localization_placeholder_changesaction inherits the gap by newly depending on the scanner; the same gap already exists inios_lint_localizations'check_duplicate_keyspath."\U0041"and"A"). Real.stringskeys are identifiers or English source text; an actual duplicate would be spelled identically both times and is caught. Nobody hand-writes\U0041forA. Filing this for completeness, not urgency.ios_lint_localization_placeholder_changesaction #738's "fail closed when a.stringsfile can't be scanned" only made scanner raises safe; here the scanner returns{}without raising, so it's a silent miss the fail-closed path doesn't cover.Decode spec (what a faithful canonicalizer must reproduce)
Derived empirically against
plutil(macOS 26.x). The surprises are what make full decoding non-trivial:\NNN— greedy, max 3 digits; overflow wraps& 0xFF(\501→A,\400→NUL); high bytes 0x80–0xFD decode via the NeXTSTEP string encoding, not Latin-1 (decisive proof:\271→„U+201E, where Latin-1 would be¹); 0xFE/0xFF are undefined → NUL (\377→\0). Requires shipping a 256-entry NeXTSTEP→codepoint table — no Ruby built-in encoding reproduces it.\U— uppercase only; variable 0–4 hex digits (not fixed 4); 0 digits → NUL; BMP only; surrogate pairs combine (\UD83D\UDE00→😀); a lone/broken surrogate rejects the whole file. (\U0001F600is not 😀 — it's\x01+ literalF600.)\+ anything not recognized → drop the backslash, keep the char. So\ulowercase is literalu,\x41is literalx41(not C hex),\eis not ESC.\a \b \f \n \r \t \v \" \\(plus\U, octal). C's set minus\e,\?,\x.\\→ one\; an odd run immediately before the closing quote rejects the file.plutilkeeps NFC and NFD distinct, so compare raw decoded bytes. Addingunicode_normalizewould manufacture false-positive duplicates.Differential examples (the scanner currently misses every
≡pair)plutilcanonical key"\U0041"≡"\101"≡"A"A"\U9"0x09)\Uvariable width, 1 digit"\UD83D\UDE00"≡"😀""A"u0041\uis not unicode"\x41"≡"x41"(≠"A")x41\xis not hex"\e"≡"e"(≠ ESC)e\enot recognized"\200""\271"„"\377"≡"\400"≡"\0"& 0xFFoverflow"a\tb"≡ literal-TAB keya⇥b"\z"≡"z","\."≡".","\$"≡"$"évs NFDéAlready safe (no change needed)
read_utf8_linesusesrb:BOM|UTF-8).plutil.$/:in unquoted keys) RAISES (safe). The few fail-open edges (trailing backslash before EOF, value-side escaped quote, empty/BOM-only file) are all mooted becauseread_strings(plutil) now runs before the duplicate check and rejects those files first.Proposed fix
Option A — count-mismatch backstop (recommended, right-sized). The action already has
plutil's distinct-key count (new_strings.size). Have the scanner also return its total parsed-entry count; ifscanner_entries > plutil_distinct_keys,plutilcollapsed a duplicate the scanner's literal compare missed → fail closed, reusing the existing abort path. Across the full corpus this flags every missed-dup case with zero false positives (distinct-but-similar keys, NFC/NFD, and\U0001F600vs 😀 all keep equal counts). No escape decoding, no NeXTSTEP table, no surrogate logic. Can live in the action without changing the scanner's raise behavior, soios_lint_localizationsis unaffected.Option B — full canonicalization. Decode keys to match
plutilbyte-for-byte (the spec above) insidefind_duplicated_keys, fixing both callers. Correct, but heavy (NeXTSTEP table, surrogate handling, backslash-run parity, reject conditions) — a faithful subset of CoreFoundation's old-style plist string decoder. Justified mainly if we also want cross-platform parsing off the macOS-onlyplutilshell-out, and it should be gated on a differential-parity test againstplutil.Acceptance criteria
plutilcovering the spec rows (unicode / octal / named / strip / backslash-run / literal-vs-escaped), asserting no false positives on distinct-but-similar keys and NFC/NFD.Related
ios_lint_localization_placeholder_changesaction #738 (Add ios_lint_localization_placeholder_changes action); original feature request New action: localization placeholder-compatibility check (catch placeholder-breaking changes to existing keys) #736.find_duplicated_keysinlib/fastlane/plugin/wpmreleasetoolkit/helper/ios/ios_strings_file_validation_helper.rb, also used byios_lint_localizations(check_duplicate_keys).