Skip to content

Reduce false positives in privacy policy check#305

Closed
rbowen wants to merge 1 commit intoapache:masterfrom
rbowen:fix/privacy-regex
Closed

Reduce false positives in privacy policy check#305
rbowen wants to merge 1 commit intoapache:masterfrom
rbowen:fix/privacy-regex

Conversation

@rbowen
Copy link
Copy Markdown
Contributor

@rbowen rbowen commented May 7, 2026

Problem: The privacy policy site check flags 17 projects as non-compliant even though they link to valid privacy policy pages hosted on their own *.apache.org subdomain. This is a 89% false positive rate for this check.

Fix: Expand the CHECK_VALIDATE regex to also accept any URL on *.apache.org that contains "privac" in the path.

Projects that will move from WARN → PASS: beam, bookkeeper, bval, helix, hudi, johnzon, karaf, knox, openjpa, opennlp, pig, shiro, systemds, tomee, uima, unomi, zookeeper

Still correctly rejected: policies.google.com/privacy (parquet), github.com/apache/privacy-website (dataprivacy)

Many projects host their own privacy policy page on their *.apache.org subdomain (e.g., beam.apache.org/privacy_policy, karaf.apache.org/privacy.html). These pages typically mirror or link to the canonical ASF privacy policy, but are currently flagged as non-compliant because the validation regex only accepts two exact canonical URLs.

This change adds a third alternative that accepts any *.apache.org URL containing 'privac' (covering privacy, privacy-policy, privacypolicy, etc.).

This eliminates 17 of 19 privacy warnings as false positives while still correctly rejecting links to non-ASF domains (e.g., policies.google.com).

Also adds rspec tests for the privacy check.

Many projects host their own privacy policy page on their *.apache.org
subdomain (e.g., beam.apache.org/privacy_policy, karaf.apache.org/privacy.html).
These pages typically mirror or link to the canonical ASF privacy policy,
but are currently flagged as non-compliant because the validation regex
only accepts two exact canonical URLs.

This change adds a third alternative that accepts any *.apache.org URL
containing 'privac' (covering privacy, privacy-policy, privacypolicy, etc.).

This eliminates 17 of 19 privacy warnings as false positives while still
correctly rejecting links to non-ASF domains (e.g., policies.google.com).

Also adds rspec tests for the privacy check.
@sebbASF
Copy link
Copy Markdown
Contributor

sebbASF commented May 8, 2026

-1

According to https://www.apache.org/foundation/marks/pmcs.html#navigation, projects must link to the privacy website. There is no option for alternatives. Whimsy needs to follow the policy.

@sebbASF
Copy link
Copy Markdown
Contributor

sebbASF commented May 8, 2026

Or the policy needs to be changed

@rbowen
Copy link
Copy Markdown
Contributor Author

rbowen commented May 8, 2026

Noted.

@rbowen rbowen closed this May 8, 2026
@rbowen rbowen deleted the fix/privacy-regex branch May 8, 2026 14:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants