Skip to content

feat(build): RAT license-header enforcement as a buildSrc convention plugin (stacked on #138)#150

Merged
epugh merged 29 commits into
apache:mainfrom
adityamparikh:add-rat-validation-buildsrc
Jun 15, 2026
Merged

feat(build): RAT license-header enforcement as a buildSrc convention plugin (stacked on #138)#150
epugh merged 29 commits into
apache:mainfrom
adityamparikh:add-rat-validation-buildsrc

Conversation

@adityamparikh

@adityamparikh adityamparikh commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

What

Adds Apache RAT (Release Audit Tool) license-header enforcement, implemented as an
org.apache.solr.mcp.rat buildSrc convention plugin rather than inline build logic.
RAT is wired into check, so ./gradlew build audits that every scanned file carries an
ASF license header (report at build/reports/rat/index.html).

Fixes #141.

Relationship to other PRs

Because #138 isn't merged yet, the "Files changed" diff
here includes #138's commits too. The RAT-only delta is the single commit 9ec8521
(see the Commits tab). After #138 merges this will rebase down to just that commit.

Design

Piece Role
buildSrc/src/main/kotlin/org.apache.solr.mcp.rat.gradle.kts Convention plugin: applies RAT and configures its excludes.
buildSrc/src/main/kotlin/.../RatExcludes.kt Pure, Gradle-free helper that translates .gitignore entries into RAT (Ant) globs.
buildSrc/src/test/kotlin/.../RatExcludesTest.kt Unit tests for the translation.

Exclusions come from two sources (same approach as the inline version): .gitignore
reused as the single source of truth for ignored/build output, plus an explicit list of
tracked files that legitimately carry no header (binaries, data, docs, infra config,
LICENSE/NOTICE).

Improvements over the inline approach (#149)

Extracting the translation into a testable helper surfaced and fixed two .gitignore
semantics gaps:

  1. Interior-slash anchoring. Git anchors a pattern with an interior slash (e.g.
    src/generated) to the repo root. The inline version prefixed every non-/-leading
    entry with **/, turning it into an any-depth match. The helper now distinguishes
    root-anchored (leading or interior slash) from any-depth (no separator / trailing
    slash only). Covered by tests.
  2. Local developer-tooling dirs. Added .claude/** and **/.kotlin/** excludes
    (Claude Code worktrees + Kotlin compiler caches) so a local ./gradlew build doesn't
    fail on untracked tooling dirs — analogous to the already-gitignored .idea/.gradle.

ASF headers were added to the three application*.properties and gradle/libs.versions.toml
so they pass the audit.

Verification

  • ./gradlew buildpassing (full suite, incl. Testcontainers integration tests).
  • ./gradlew rat — clean audit, 0 unapproved.
  • ./gradlew :buildSrc:testRatExcludes translation tests pass.
  • ./gradlew spotlessCheck — passing.

🤖 Generated with Claude Code

claude and others added 27 commits June 4, 2026 21:34
Add the top-level Apache License 2.0 text and NOTICE file required by
ASF release policy, and bundle them into the META-INF directory of every
JAR produced by the build (main, bootJar, sources, javadoc).

See https://www.apache.org/legal/release-policy.html#licensing-documentation
Captures decisions made during brainstorming: CycloneDX over SPDX,
embed-in-bootJar via Spring Boot's native CycloneDX integration, full
build + Docker + Release coverage, no cosign attestation in this PR.

Signed-off-by: Aditya Parikh <aditya.m.parikh@gmail.com>
Signed-off-by: adityamparikh <aditya.m.parikh@gmail.com>
Step-by-step bite-sized tasks covering: version catalog, Gradle plugin
wiring, actuator endpoint enablement, focused HTTP integration test,
CI workflow uploads, README + CLAUDE.md docs, final verification.

Signed-off-by: Aditya Parikh <aditya.m.parikh@gmail.com>
Signed-off-by: adityamparikh <aditya.m.parikh@gmail.com>
Plugin will be applied in the next commit. Adding the catalog entry
first keeps build.gradle.kts changes reviewable in isolation.

Signed-off-by: Aditya Parikh <aditya.m.parikh@gmail.com>
Signed-off-by: adityamparikh <aditya.m.parikh@gmail.com>
Apply org.cyclonedx.bom Gradle plugin 2.4.1. Spring Boot 3.5's
CycloneDxPluginAction auto-wires bootJar to embed the generated SBOM at
META-INF/sbom/application.cdx.json, so every distribution (JAR, Jib JVM
image, both Paketo native images) ships the embedded SBOM via bootJar
packaging — no per-image wiring.

Plugin version note: 1.10.0 breaks against Gradle 9.4 with
UnsupportedOperationException (ImmutableCollection.removeAll). 2.4.1 is
the latest v1.x-compatible class layout (CycloneDxPlugin /
CycloneDxTask) that Spring Boot's auto-integration recognizes; v3.x
renamed the classes (CyclonedxPlugin) and is incompatible until Spring
Boot adopts the new shape.

projectType is set explicitly to Component.Type.APPLICATION because
v2.4.1 changed the property from Property<String> to
Property<Component.Type>; Spring Boot's `.convention("application")`
would store a raw String and break the task at execution time.

Signed-off-by: Aditya Parikh <aditya.m.parikh@gmail.com>
Signed-off-by: adityamparikh <aditya.m.parikh@gmail.com>
`sbom` was already in management.endpoints.web.exposure.include; this
makes the endpoint enablement explicit so the file conveys intent
without relying on Spring Boot defaults.

Signed-off-by: Aditya Parikh <aditya.m.parikh@gmail.com>
Signed-off-by: adityamparikh <aditya.m.parikh@gmail.com>
…sions

- Drop the planned SbomEndpointIntegrationTest: /actuator/sbom is stock
  Spring Boot functionality; our only project-specific addition is two
  property lines. The build itself fails if cyclonedxBom breaks
  (Spring Boot's bootJar auto-depends on it).
- Update plugin version note to 2.4.1 and explain why both 1.10.0 (Gradle
  9.4 bug) and 3.x (Spring Boot class-name change) are unsuitable.
- CycloneDX schema 1.6 (plugin default) replaces the originally-noted 1.5.

Signed-off-by: Aditya Parikh <aditya.m.parikh@gmail.com>
Signed-off-by: adityamparikh <aditya.m.parikh@gmail.com>
Signed-off-by: Aditya Parikh <aditya.m.parikh@gmail.com>
Signed-off-by: adityamparikh <aditya.m.parikh@gmail.com>
Earlier edit lost the detail by accident. Restored as part of the Tool
choice section so the spec stands on its own.

Signed-off-by: Aditya Parikh <aditya.m.parikh@gmail.com>
Signed-off-by: adityamparikh <aditya.m.parikh@gmail.com>
Mirrors the existing JAR/test-results/coverage upload pattern. Retains
the SBOM for 30 days (vs the standard 7) since supply-chain
investigations often happen well after a build.

Signed-off-by: Aditya Parikh <aditya.m.parikh@gmail.com>
Signed-off-by: adityamparikh <aditya.m.parikh@gmail.com>
The existing Generate SBOM step swallowed errors with `|| echo "..."`,
masking failures now that the plugin is wired. Removes the fallback,
uploads the SBOM as a 90-day workflow artifact, and attaches it to the
v<version> GitHub Release when one exists (graceful fallback otherwise
since the source release of record lives at dist.apache.org, not GitHub).

RELEASE_VERSION is already validated by validate-release; routing it
through an env var instead of inline ${{ }} interpolation is
defence-in-depth against actions-injection.

Signed-off-by: Aditya Parikh <aditya.m.parikh@gmail.com>
Signed-off-by: adityamparikh <aditya.m.parikh@gmail.com>
New 'Supply chain & SBOM' section covers all four distribution
channels (embedded in JAR/image, /actuator/sbom endpoint, GitHub
Release asset, CI workflow artifact) and shows trivy/grype usage.

Signed-off-by: Aditya Parikh <aditya.m.parikh@gmail.com>
Signed-off-by: adityamparikh <aditya.m.parikh@gmail.com>
Spring Boot 3.5.14's CycloneDxPluginAction already sets outputName,
outputFormat, projectType, and wires bootJar embedding — matching what
Spring Initializr generates for the same dependency set. Verified that
applying the plugin alone produces a valid CycloneDX 1.6 SBOM at
META-INF/sbom/application.cdx.json inside the bootJar with
component type=application.

The earlier projectType override + includeConfigs/skipConfigs were
defensive but unnecessary; let the framework defaults work.

Signed-off-by: Aditya Parikh <aditya.m.parikh@gmail.com>
Signed-off-by: adityamparikh <aditya.m.parikh@gmail.com>
CLAUDE.md symlinks to AGENTS.md; edit lands on the real file.

Records the cyclonedxBom command and how the SBOM flows through
bootJar → actuator → Docker images, so future agents have the
mental model when working on related code.

Signed-off-by: Aditya Parikh <aditya.m.parikh@gmail.com>
Signed-off-by: adityamparikh <aditya.m.parikh@gmail.com>
Signed-off-by: Aditya Parikh <aditya.m.parikh@gmail.com>
Signed-off-by: adityamparikh <aditya.m.parikh@gmail.com>
The base LICENSE/NOTICE are correct for the source release, but the binary
release (the Spring Boot fat bootJar) bundles third-party bytecode and so per
https://infra.apache.org/licensing-howto.html must additionally enumerate each
bundled dependency's license and lift bundled ASF dependencies' NOTICE snippets.

Stacks on the CycloneDX SBOM work and reuses it as the source of dependency
license data:

- generateBinaryLicense: base Apache-2.0 + an appendix listing every
  productionRuntimeClasspath dependency with a link to its license, read from the
  bundled SBOM (META-INF/sbom/application.cdx.json). The SBOM resolves a license
  for every component, including Gradle-module-metadata-only ASF artifacts
  (solr-solrj/solr-api) that POM-only scanners miss, so no per-dependency list is
  hand-maintained. It also gates the build: a bundled module missing from the SBOM,
  or carrying a license not in config/license-policy.json, fails the build.
- generateBinaryNotice: base NOTICE + the META-INF/NOTICE files lifted verbatim and
  de-duplicated from the bundled jars (the Shade ApacheNoticeResourceTransformer
  approach), so ASF dependency notices stay current automatically.

config/license-policy.json holds the allowedLicenses set plus overrides
(group:name -> SPDX id) correcting the few components CycloneDX mislabels
(mcp-server-security -> Apache-2.0; ANTLR ST4/antlr-runtime -> BSD-3-Clause).
Source-form jars keep the base LICENSE/NOTICE.

Verified: ./gradlew build green; fat jar META-INF/LICENSE lists 158 deps
(incl. SolrJ) and META-INF/NOTICE aggregates 21 upstream notices.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: adityamparikh <aditya.m.parikh@gmail.com>
Move the inline LICENSE/NOTICE logic out of the root build.gradle.kts into a
buildSrc convention plugin (org.apache.solr.mcp.license-notice) backed by two
typed tasks:

- GenerateBinaryLicense / GenerateBinaryNotice are proper DefaultTask types with
  @InputFile/@InputFiles/@OutputFile, so they're incremental and (being real .kt
  files) free of the kts-script-compiler limitations that forced the previous
  Pair-based workarounds — the logic now reads as plain Kotlin with data classes.
- The root build.gradle.kts drops ~250 lines and three imports, and just applies
  `id("org.apache.solr.mcp.license-notice")`.

Behaviour is unchanged: the bootJar still bundles a LICENSE with the SBOM-derived
158-dependency appendix (incl. SolrJ) and a NOTICE aggregating 21 upstream
notices; source-form jars keep the base files; `check` still runs the gate.
The tasks now live in buildSrc, so they can be unit-tested with Gradle TestKit.

Verified: ./gradlew build green; fat-jar META-INF/LICENSE and NOTICE identical
to the pre-refactor output.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: adityamparikh <aditya.m.parikh@gmail.com>
Add ProjectBuilder-based tests for the two convention-plugin tasks (now possible
since they live in buildSrc as typed tasks). Covers the correctness-critical
behaviour without needing the full spring-boot + cyclonedx stack:

- generateBinaryLicense: appendix lists bundled deps with SPDX links, applies a
  policy override to correct a mislabelled SBOM license, and preserves the base
  LICENSE text; the gate fails on a disallowed license and on a bundled coordinate
  absent from the SBOM.
- generateBinaryNotice: aggregates bundled META-INF/NOTICE files verbatim,
  de-duplicates identical notices, attributes each to its module, and emits just
  the project NOTICE when no dependency notices exist.

buildSrc's test task runs as part of `./gradlew build`, so these are enforced on
every build.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: adityamparikh <aditya.m.parikh@gmail.com>
Add step-by-step comments to GenerateBinaryLicense/GenerateBinaryNotice walking
through what each phase does (load policy, index the SBOM, resolve+gate each
shipped dependency, write the file; and notice matching/de-dup/attribution).

Expand the AGENTS.md "Release LICENSE / NOTICE" section with where the tasks are
unit-tested and a short runbook for what to do when the license gate fails
(add an override for an SBOM mislabel, or allow a genuinely new license) instead
of silencing it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: adityamparikh <aditya.m.parikh@gmail.com>
…batim

apache/solr has no license allow-list (it uses a per-dependency licenses/ folder,
which JanHoy said not to replicate), and the binary LICENSE is a disclosure, not a
license policy. Remove config/license-policy.json and the allow-list gate +
override corrections it powered.

generateBinaryLicense now lists each shipped dependency with the license the
CycloneDX SBOM reports, verbatim — so a few imprecise-but-permissive upstream
labels appear as-is (mcp-server-security: Apache-1.0; ANTLR: BSD-4-Clause / BSD
licence). The appendix preamble says licenses are as-reported and links each one.

The remaining gate is completeness only: fail if a bundled dependency is absent
from the SBOM, so nothing is silently omitted from the LICENSE. Tests updated to
assert verbatim SBOM labels and SBOM name/URL handling.

Verified: ./gradlew build green; fat-jar LICENSE still lists 158 deps and NOTICE
aggregates 21 upstream notices.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: adityamparikh <aditya.m.parikh@gmail.com>
Add a line to the appendix preamble noting the machine-readable bill of
materials (component versions, hashes, licenses) is bundled at
META-INF/sbom/application.cdx.json — the inline appendix stays the
human-readable disclosure, with the SBOM offered for tooling.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: adityamparikh <aditya.m.parikh@gmail.com>
Add a 'where / when they appear' note to the Release LICENSE / NOTICE section:
both binary files are regenerated on every build (tasks run ahead of bootJar and
in check), land at META-INF/LICENSE and META-INF/NOTICE in the fat jar and thus
in every published Docker image, and are also written to build/generated/license/
for local viewing; source-form jars carry the repo-root base files.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: adityamparikh <aditya.m.parikh@gmail.com>
…e readers

Reviewers who don't work with Gradle had no easy way into buildSrc. Add:

- buildSrc/README.md: what buildSrc is, a short glossary of the Gradle concepts
  the code uses (Task, @TaskAction, the input/output annotations, Property/Provider
  types, convention plugin, productionRuntimeClasspath), and the end-to-end flow.
- KDoc on GenerateBinaryLicense / GenerateBinaryNotice: a "for readers new to
  Gradle" orientation on each class plus a note on every annotated property
  explaining what the input/output annotation does (up-to-date checking, ordering).
- A note on the convention plugin header explaining precompiled script plugins,
  and a comment on buildSrc/build.gradle.kts explaining what it builds.

Documentation only; no behaviour change. ./gradlew :buildSrc:test green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: adityamparikh <aditya.m.parikh@gmail.com>
Add plain-language inline comments through the plugin body explaining the parts
that are opaque without Gradle background: what a 'configuration' is and why
productionRuntimeClasspath equals 'what ships', how the lazy provider chains
(flatMap/map over resolvedArtifacts) derive the coordinate list and the
jar-name->coordinate map, what tasks.register/.set wiring does, and how metaInf
from(...) plus dependsOn bundle the generated files into the bootJar while the
source-form jars keep the base files. Comments only; code unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: adityamparikh <aditya.m.parikh@gmail.com>
Add Apache RAT (Release Audit Tool) header enforcement as an `org.apache.solr.mcp.rat` buildSrc convention plugin, stacked on the license-notice plugin from apache#138. RAT is wired into `check`, so `./gradlew build` audits that every scanned file carries an ASF header (report at build/reports/rat/index.html).

The .gitignore-to-RAT-glob translation lives in a pure, unit-tested `RatExcludes` helper rather than inline in build.gradle.kts. Moving it to buildSrc fixes two gitignore-semantics gaps from the inline approach: interior-slash patterns (e.g. src/generated) are now root-anchored instead of matched at any depth, and the negation/anchoring rules are documented and tested.

Local developer-tooling dirs (.claude worktrees, .kotlin caches) are excluded so contributors don't hit spurious audit failures. ASF headers are added to the three application*.properties and libs.versions.toml so they pass the audit.

Supersedes the inline approach in apache#149. Stacked on apache#138. Fixes apache#141.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: adityamparikh <aditya.m.parikh@gmail.com>
@adityamparikh

Copy link
Copy Markdown
Contributor Author

Closing in favor of adityamparikh/solr-mcp#91, which is based on the #138 branch (add-apache-license-notice) so the diff is only the RAT changes (one commit) instead of #138 + RAT combined. Once #138 merges, I'll open the upstream PR against main.

@adityamparikh

adityamparikh commented Jun 13, 2026

Copy link
Copy Markdown
Contributor Author

@epugh — reopening this as the apache-side PR for the buildSrc / convention-plugin take on RAT, so you can review it here. It's the alternative to the inline tasks.rat { } configuration in #149.

A couple of notes for reviewing:

What it does differently from #149: adds an org.apache.solr.mcp.rat convention plugin alongside #138's license-notice, and moves the .gitignore→glob translation into a pure, unit-tested RatExcludes helper. That testability caught a bug in the inline version — interior-slash patterns like src/generated were matched at any depth instead of being root-anchored.

./gradlew build passes (incl. Testcontainers integration tests) and ./gradlew rat is a clean audit. Happy to go whichever way you prefer — inline (#149) or buildSrc (here).

@epugh epugh merged commit 518bc17 into apache:main Jun 15, 2026
1 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Apache RAT for license scanning

3 participants