Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,15 @@ _None_

### Bug Fixes

_None_
- `StringsFileValidationHelper.find_duplicated_keys` now parses *unquoted* keys and values (valid `.strings` syntax, common in `InfoPlist.strings`) instead of raising `Invalid character`, matching the character set `plutil` accepts for unquoted strings (alphanumerics plus `_ . - $ : /`). This lets `ios_lint_localizations`' `check_duplicate_keys` work on `InfoPlist.strings`-style files. [#741]
- `StringsFileValidationHelper.find_duplicated_keys` now also accepts comments placed *between* the tokens of a statement (e.g. `"key" /* note */ = "value";`), which `plutil` allows, instead of raising `Invalid character` on the `/`. [#741]
- `ios_lint_localizations`' `check_duplicate_keys` no longer crashes the lane on a file that parses as a property list but isn't a flat `.strings` (e.g. a nested-dictionary value); it now warns via `UI.important` and skips that file. [#741]
- `L10nHelper.merge_strings` now applies the key prefix to unquoted keys containing `. - $ : /` and to lines with an *unquoted* value (e.g. `CFBundleName = WordPress;`), matching the unquoted-string grammar `plutil` accepts. Previously those keys were written to the merged file without the prefix while still being bookkept *with* it, leaving the output inconsistent with the reported keys (which could resurface the very collisions the prefix avoids and break downstream key extraction). [#741]

### Internal Changes

_None_
- Centralized `.strings` duplicate-key detection behind `StringsFileValidationHelper.scan_for_duplicate_keys`, which detects the file format and returns a `[:scanned | :unsupported_format | :unscannable, payload]` tri-state that callers map to their own policy (warn-and-skip vs fail-closed). `ios_lint_localizations` now uses it. [#741]
- `L10nHelper.strings_file_type` (and `StringsFileValidationHelper.scan_for_duplicate_keys`) accept `assume_valid:` to skip a redundant `plutil -lint` when the caller has already parsed the file. [#741]

## 14.7.0

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -56,15 +56,20 @@ def self.find_duplicated_keys(params)
language = File.basename(File.dirname(file), '.lproj')
path = File.join(params[:input_dir], file)

file_type = Fastlane::Helper::Ios::L10nHelper.strings_file_type(path: path)
if file_type == :text
duplicates = Fastlane::Helper::Ios::StringsFileValidationHelper.find_duplicated_keys(file: path)
duplicate_keys[language] = duplicates.map { |key, value| "`#{key}` was found at multiple lines: #{value.join(', ')}" } unless duplicates.empty?
else
status, payload = Fastlane::Helper::Ios::StringsFileValidationHelper.scan_for_duplicate_keys(file: path)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice appraoch.

case status
when :scanned
duplicate_keys[language] = payload.map { |key, value| "`#{key}` was found at multiple lines: #{value.join(', ')}" } unless payload.empty?
when :unsupported_format
UI.important <<~WRONG_FORMAT
File `#{path}` is in #{file_type} format, while finding duplicate keys only make sense on files that are in ASCII-plist format.
Since your files are in #{file_type} format, you should probably disable the `check_duplicate_keys` option from this `#{action_name}` call.
File `#{path}` is in #{payload} format, while finding duplicate keys only make sense on files that are in ASCII-plist format.

@mokagio mokagio Jun 26, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick, even though it was in the previous version "only makes sense" is a bit condescending IMHO.

Suggested change
File `#{path}` is in #{payload} format, while finding duplicate keys only make sense on files that are in ASCII-plist format.
File `#{path}` is in #{payload} format, while finding duplicate keys can only occurr on files that are in ASCII-plist format.

Since your files are in #{payload} format, you should probably disable the `check_duplicate_keys` option from this `#{action_name}` call.
WRONG_FORMAT
when :unscannable
UI.important <<~UNSCANNABLE
Could not check `#{path}` for duplicate keys: #{payload.strip}
The file parses as a property list but isn't a flat `.strings` file the duplicate-key scanner understands, so duplicate detection was skipped for it.
UNSCANNABLE
end
end

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,18 +15,24 @@ class L10nHelper
# Returns the type of a `.strings` file (XML, binary or ASCII)
#
# @param [String] path The path to the `.strings` file to check
# @param [Boolean] assume_valid Skip the `plutil -lint` validity check when the caller has already
# confirmed the file parses (e.g. via `read_strings_file_as_hash`), avoiding a redundant
# `plutil` invocation. Only the format detection (`file`) then runs.
# @return [Symbol] The file format used by the `.strings` file. Can be one of:
# - `:text` for the ASCII-plist file format (containing typical `"key" = "value";` lines)
# - `:xml` for XML plist file format (can be used if machine-generated, especially since there's no official way/tool to generate the ASCII-plist file format as output)
# - `:binary` for binary plist file format (usually only true for `.strings` files converted by Xcode at compile time and included in the final `.app`/`.ipa`)
# - `nil` if the file does not exist or is neither of those format (e.g. not a `.strings` file at all)
#
def self.strings_file_type(path:)
def self.strings_file_type(path:, assume_valid: false)
return :text if File.empty?(path) # If completely empty file, consider it as a valid `.strings` files in textual format

# Start by checking it seems like a valid property-list file (and not e.g. an image or plain text file)
_, status = Open3.capture2('/usr/bin/plutil', '-lint', path)
return nil unless status.success?
# Start by checking it seems like a valid property-list file (and not e.g. an image or plain text file).
# A caller that has already parsed the file can skip this redundant check via `assume_valid: true`.
unless assume_valid
_, status = Open3.capture2('/usr/bin/plutil', '-lint', path)
return nil unless status.success?
end

# If it is a valid property-list file, determine the actual format used
format_desc, status = Open3.capture2('/usr/bin/file', path)
Expand Down Expand Up @@ -96,9 +102,17 @@ def self.merge_strings(paths:, output_path:)
# Read line-by-line to reduce memory footprint during content copy
read_utf8_lines(input_file).each do |line|
unless prefix.nil? || prefix.empty?
# The `/u` modifier on the RegExps is to make them UTF-8
# The `/u` modifier on the RegExps is to make them UTF-8.
# The unquoted-key character set below matches what `plutil` accepts (alphanumerics plus
# `_ . - $ : /`, as in `StringsFileValidationHelper::UNQUOTED_STRING_CHARACTER`) so that keys
# containing `. - $ : /` (e.g. reverse-DNS keys) are prefixed like any other.
line.gsub!(/^(\s*")/u, "\\1#{prefix}") # Lines starting with a quote are considered to be start of a key; add prefix right after the quote
line.gsub!(/^(\s*)([A-Z0-9_]+)(\s*=\s*")/ui, "\\1\"#{prefix}\\2\"\\3") # Lines starting with an identifier followed by a '=' are considered to be an unquoted key (typical in InfoPlist.strings files for example)
# Lines starting with an unquoted key followed by a `=` and a *quoted* value (typical in `InfoPlist.strings`).
line.gsub!(%r{^(\s*)([a-zA-Z0-9_.$:/-]+)(\s*=\s*")}u, "\\1\"#{prefix}\\2\"\\3")
# Lines starting with an unquoted key followed by a `=` and an *unquoted* value (e.g. `CFBundleName = WordPress;`).
# The value is required to be a single unquoted token ending in `;` — an unquoted string can't contain spaces, so
# this won't mistake comment prose like `and = a red herring for when…` (spaces in the "value") for a key/value pair.
line.gsub!(%r{^(\s*)([a-zA-Z0-9_.$:/-]+)(\s*=\s*)([a-zA-Z0-9_.$:/-]+\s*;)}u, "\\1\"#{prefix}\\2\"\\3\\4")
end

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My AI noticed that merge_strings doesn't handle inter-token comments.

Could this logic handle the same inter-token comment placements that StringsFileValidationHelper now accepts? (https://github.com/wordpress-mobile/release-toolkit/pull/741/changes#diff-6e464dfe8ccd3974026649f8aaa02f9f8518261ef391f13bdb9535ef46d2b908R111-R116)

For example, CFBundleName /* c */ = WordPress; is accepted by plutil and by the duplicate-key scanner after this PR, but merge_strings still writes it unprefixed while bookkeeping it as prefixed.
That can reintroduce collisions in the merged file.

I added failing specs in #742 to pin the gap.

AI suggests: A tokenizer-backed rewrite would be ideal, but a comment-aware interim rewrite would also close the regression.

tmp_file.write(line)
end
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,32 +5,83 @@ module Helper
module Ios
class StringsFileValidationHelper
# context can be one of:
# :root, :maybe_comment_start, :in_line_comment, :in_block_comment,
# :maybe_block_comment_end, :in_quoted_key,
# :after_quoted_key_before_eq, :after_quoted_key_and_equal,
# :in_quoted_value, :after_quoted_value
State = Struct.new(:context, :buffer, :in_escaped_ctx, :found_key)
# :root, :maybe_comment_start, :maybe_comment_or_value, :in_line_comment, :in_block_comment,
# :maybe_block_comment_end, :in_quoted_key, :in_unquoted_key,
# :after_quoted_key_before_eq, :after_quoted_key_and_eq,
# :in_quoted_value, :in_unquoted_value, :after_quoted_value
#
# `resume_context` holds the context to return to once a comment ends. Comments are valid not only at
# the top level but also *between* the tokens of a statement (e.g. `"key" /* note */ = "value";`), so a
# comment must resume the state it interrupted rather than always dropping back to `:root`.
State = Struct.new(:context, :buffer, :in_escaped_ctx, :found_key, :resume_context)

# Characters allowed in an *unquoted* string — a key or a value. Unquoted strings are valid
# `.strings` syntax (the old-style ASCII property-list format) and are common in `InfoPlist.strings`
# (e.g. `CFBundleName = WordPress;`). `plutil` accepts alphanumerics plus `_ . - $ : /` in an
# unquoted string, so we match the same set: this scanner only ever runs on input `plutil` has
# already accepted, and matching its grammar keeps a file it parses from tripping the scanner.
# An unquoted key runs until the first whitespace or `=`; an unquoted value until whitespace or `;`.
UNQUOTED_STRING_CHARACTER = %r{[a-zA-Z0-9_.$:/-]}u

# Enter a comment from an inter-token position, remembering where to resume once it ends.
# (`state.context` is still the originating state when a transition lambda runs — it's only reassigned
# to the lambda's return value afterwards — so this captures the state the comment interrupts.)
ENTER_COMMENT = lambda do |state, _c|
state.resume_context = state.context
:maybe_comment_start
end

# A `/` between `=` and the value is ambiguous: it can start a comment (`/* … */` or `// …`) or be the
# first character of an unquoted value (e.g. a path like `/usr/bin`). Defer the decision by one character
# via `:maybe_comment_or_value`.
ENTER_COMMENT_OR_VALUE = lambda do |state, _c|
state.resume_context = state.context
:maybe_comment_or_value
end

# Restore the context a comment interrupted (defaults to `:root`), then clear the saved context.
RESUME_AFTER_COMMENT = lambda do |state, _c|
resume = state.resume_context || :root
state.resume_context = :root
resume
end

TRANSITIONS = {
root: {
/\s/u => :root,
'/' => :maybe_comment_start,
'"' => :in_quoted_key
'"' => :in_quoted_key,
# An unquoted key, e.g. `CFBundleName = "…";` as used by `InfoPlist.strings`.
UNQUOTED_STRING_CHARACTER => lambda do |state, c|
state.buffer.write(c)
:in_unquoted_key
end
},
maybe_comment_start: {
'/' => :in_line_comment,
/\*/u => :in_block_comment
},
# Reached only from `:after_quoted_key_and_eq`, where a leading `/` might begin a comment or an
# unquoted value. A `*` or `/` confirms a comment; anything else means the `/` was the value's first
# character (values aren't buffered, so we just continue scanning the value).
maybe_comment_or_value: {
/\*/u => :in_block_comment,
'/' => :in_line_comment,
/./mu => lambda do |state, _c|
state.resume_context = :root
:in_unquoted_value
end
},
in_line_comment: {
"\n" => :root,
"\n" => RESUME_AFTER_COMMENT,
/./u => :in_line_comment
},
in_block_comment: {
/\*/ => :maybe_block_comment_end,
/./mu => :in_block_comment
},
maybe_block_comment_end: {
'/' => :root,
'/' => RESUME_AFTER_COMMENT,
/./mu => :in_block_comment
},
in_quoted_key: {
Expand All @@ -44,19 +95,48 @@ class StringsFileValidationHelper
:in_quoted_key
end
},
in_unquoted_key: {
# The key ends at the first whitespace or `=`. Whitespace still expects an `=` next;
# an `=` moves straight on to the value.
/[\s=]/u => lambda do |state, c|
state.found_key = state.buffer.string.dup
state.buffer = StringIO.new
c == '=' ? :after_quoted_key_and_eq : :after_quoted_key_before_eq
end,
UNQUOTED_STRING_CHARACTER => lambda do |state, c|
state.buffer.write(c)
:in_unquoted_key
end
},
after_quoted_key_before_eq: {
# A comment may sit between the key and the `=` (e.g. `"key" /* note */ = "value";`).
'/' => ENTER_COMMENT,
/\s/u => :after_quoted_key_before_eq,
'=' => :after_quoted_key_and_eq
},
after_quoted_key_and_eq: {
# A `/` here may start a comment or an unquoted value (which can contain `/`); disambiguate one char
# later. This entry must precede `UNQUOTED_STRING_CHARACTER` below, which also matches `/`.
'/' => ENTER_COMMENT_OR_VALUE,
/\s/u => :after_quoted_key_and_eq,
'"' => :in_quoted_value
'"' => :in_quoted_value,
# An unquoted value, e.g. `CFBundleName = WordPress;` as used by `InfoPlist.strings`.
UNQUOTED_STRING_CHARACTER => :in_unquoted_value
},
in_quoted_value: {
'"' => :after_quoted_value,
/./mu => :in_quoted_value
},
in_unquoted_value: {
# The value ends at the first whitespace or the terminating `;`. Its contents are irrelevant
# to duplicate-key detection, so — unlike a key — we don't buffer it.
';' => :root,
/\s/u => :after_quoted_value,
UNQUOTED_STRING_CHARACTER => :in_unquoted_value
},
after_quoted_value: {
# A comment may sit between the value and the terminating `;` (e.g. `"key" = "value" /* note */;`).
'/' => ENTER_COMMENT,
/\s/u => :after_quoted_value,
';' => :root
}
Expand All @@ -70,7 +150,7 @@ class StringsFileValidationHelper
def self.find_duplicated_keys(file:)
keys_with_lines = Hash.new { |h, k| h[k] = [] }

state = State.new(context: :root, buffer: StringIO.new, in_escaped_ctx: false, found_key: nil)
state = State.new(context: :root, buffer: StringIO.new, in_escaped_ctx: false, found_key: nil, resume_context: :root)

# Using our `each_utf8_line` helper instead of `File.readlines` ensures we can also read files that are
# encoded in UTF-16, yet process each of their lines as a UTF-8 string, so that `RegExp#match?` don't throw
Expand Down Expand Up @@ -106,6 +186,28 @@ def self.find_duplicated_keys(file:)

keys_with_lines.keep_if { |_, lines| lines.count > 1 }
end

# Detects the file format and, when applicable, scans for duplicate keys — in one step, so
# callers don't each re-implement the "`:text`-only" gate. `find_duplicated_keys` only
# understands the flat ASCII-plist syntax; an xml/binary plist can't be tokenized by it (though
# `plutil` collapses any duplicate to its last value when parsing those anyway).
#
# @param [String] file The path to the `.strings` file to inspect.
# @param [Boolean] assume_valid Forwarded to `strings_file_type`: skip the redundant `plutil -lint`
# when the caller has already confirmed the file parses.
# @return [Array] A `[status, payload]` pair, one of:
# - `[:scanned, { key => [lines] }]` — a `:text` file we tokenized (hash empty if none).
# - `[:unsupported_format, format]` — not a `:text` file (`:xml`, `:binary`, or `nil`); not scanned.
# - `[:unscannable, error_message]` — a `:text` file the tokenizer couldn't read.
# Each caller decides how to react (warn-and-skip, fail closed, …) from this one source of truth.
def self.scan_for_duplicate_keys(file:, assume_valid: false)
format = Fastlane::Helper::Ios::L10nHelper.strings_file_type(path: file, assume_valid: assume_valid)
return [:unsupported_format, format] unless format == :text

[:scanned, find_duplicated_keys(file: file)]
rescue StandardError => e
[:unscannable, e.message]
end
end
end
end
Expand Down
30 changes: 30 additions & 0 deletions spec/ios_l10n_helper_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,14 @@ def file_encoding(path)
expect(File).to exist(invalid_fixture)
expect(described_class.strings_file_type(path: invalid_fixture)).to be_nil
end

it 'skips the redundant `plutil -lint` check when `assume_valid: true`, still detecting the format' do
text_fixture = fixture('Localizable-utf16.strings')
allow(Open3).to receive(:capture2).and_call_original
expect(Open3).not_to receive(:capture2).with('/usr/bin/plutil', '-lint', anything)

expect(described_class.strings_file_type(path: text_fixture, assume_valid: true)).to eq(:text)
end
end

describe '#merge_strings' do
Expand Down Expand Up @@ -92,6 +100,28 @@ def file_encoding(path)
end
end

it 'prefixes unquoted keys with unquoted values and keys containing `. - $ : /`' do
# Regression: prefixing must cover the full unquoted-string grammar `plutil` accepts — unquoted *values*
# (e.g. `CFBundleName = WordPress;`) and keys containing `. - $ : /` — so the written file stays consistent
# with the prefixed keys we report. (Previously only `[A-Z0-9_]` keys with a *quoted* value were prefixed,
# leaving these written without the prefix and resurfacing the very collisions the prefix avoids.)
content = <<~STRINGS
CFBundleName = WordPress;
com.automattic.app-id = "X";
"QuotedKey" = "Y";
STRINGS
Dir.mktmpdir('a8c-release-toolkit-l10n-helper-tests-') do |tmp_dir|
input_file = File.join(tmp_dir, 'InfoPlist.strings')
File.write(input_file, content)
output_file = File.join(tmp_dir, 'output.strings')
described_class.merge_strings(paths: { input_file => 'pfx.' }, output_path: output_file)
merged = File.read(output_file)
expect(merged).to include('"pfx.CFBundleName" = WordPress;')
expect(merged).to include('"pfx.com.automattic.app-id" = "X";')
expect(merged).to include('"pfx.QuotedKey" = "Y";')
end
end

it 'returns duplicate keys found' do
paths = { fixture('Localizable-utf16.strings') => nil, fixture('non-latin-utf16.strings') => nil }
Dir.mktmpdir('a8c-release-toolkit-l10n-helper-tests-') do |tmp_dir|
Expand Down
Loading