Skip to content

Tolerate UTF8 byte order marker in file parsing#831

Open
meekee7 wants to merge 1 commit into
CycloneDX:masterfrom
meekee7:utf8bom
Open

Tolerate UTF8 byte order marker in file parsing#831
meekee7 wants to merge 1 commit into
CycloneDX:masterfrom
meekee7:utf8bom

Conversation

@meekee7
Copy link
Copy Markdown

@meekee7 meekee7 commented May 12, 2026

UTF8-encoded text files may start with a UTF8 byte order marker (also abbreviated BOM) indicating that it is a UTF8 file. The byte sequence is EF BB BF.

This change ensures that files with a UTF8BOM can be read and parsed. The heuristic of BomParserFactory in particular would previously have rejected all files with a UTF8BOM.
Test cases for both parsing and validation are added.

The test files were created by copying and saving them with UTF8BOM encoding in VS Code. The BOM character is not displayed in text previews on GitHub, but can be seen in a hex editor.

@meekee7 meekee7 requested a review from a team as a code owner May 12, 2026 07:25
@codacy-production
Copy link
Copy Markdown

codacy-production Bot commented May 12, 2026

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

Signed-off-by: Michael Kuerbis <michael_kuerbis@web.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant