Skip to content

feat(scanning): audit GitHub Actions workflows across repositories#33

Open
nakberli841-bot wants to merge 22 commits into
mainfrom
feat/10-github-actions-workflow-audit
Open

feat(scanning): audit GitHub Actions workflows across repositories#33
nakberli841-bot wants to merge 22 commits into
mainfrom
feat/10-github-actions-workflow-audit

Conversation

@nakberli841-bot

@nakberli841-bot nakberli841-bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

What does this change?

Description

Adds automated auditing of GitHub Actions workflows across all connected repositories.

What was done

  • Created workflow_scan table via Flyway V2 migration with FK to repo table
  • Added WorkflowScanEntity and WorkflowScanRepository to persist scan results
  • Added RepoRepository for repository-level DB access
  • Implemented WorkflowParser in libs/scanning to parse workflow YAML files and flag:
    • Unpinned actions (missing SHA pin)
    • Broad permissions (write-all / read-all)
    • Missing OIDC (id-token: write)
  • Implemented risk scoring (0–100) based on flagged issues
  • Implemented WorkflowScanService to fetch workflow YAMLs from GitHub API and persist results
  • Added WorkflowScanJob in apps/worker as a scheduled job that scans all repos hourly
  • Exposed GET /api/repos/{repoId}/workflow-scans endpoint in apps/api

Tests

  • WorkflowParserTest — unit tests covering unpinned actions, broad permissions, missing OIDC and clean workflow scenarios
  • WorkflowScanRepositoryTest — integration tests with Testcontainers verifying DB read/write and delete operations
  • WorkflowScanControllerTest — full stack integration tests verifying endpoint response for populated and empty scan results

Related modules

  • libs/persistence
  • libs/scanning
  • apps/worker
  • apps/api

Related issue

Closes # 10

Area

  • Frontend (apps/web)
  • Backend
  • Docs
  • Other

Screenshots

If this touches the UI, drop before and after screenshots here.

Checklist

  • I ran bun run typecheck && bun run build in apps/web (for frontend changes)
  • I bumped the version if this is a meaningful change
  • Commits are small and focused, with no AI attribution lines
  • I updated docs or AGENTS.md if behavior or structure changed

Summary by CodeRabbit

  • New Features
    • Added a new API endpoint to fetch the latest workflow scan results for a repository (including workflow path, risk score, and detected risk signals).
    • The worker now runs on a fixed interval to scan repository workflows and persist results.
  • Bug Fixes
    • Workflow scans now refresh stored results for a repository each run, and processing continues even if an individual repository fails.
  • Tests
    • Added integration tests for the repository endpoint and workflow-scan persistence behavior.

- Add V2 Flyway migration for workflow_scan table with FK to repo

- Add WorkflowScanEntity and WorkflowScanRepository

- Add RepoRepository

- Implement WorkflowParser to flag unpinned actions, broad permissions

  and missing OIDC with 0-100 risk scoring

- Implement WorkflowScanService to fetch workflow YAMLs from GitHub

  and persist scan results

- Add WorkflowScanJob scheduled worker to scan all repos hourly

- Expose GET /api/repos/{repoId}/workflow-scans endpoint

- Add unit tests for WorkflowParser

- Add integration tests for WorkflowScanRepository and WorkflowScanController
@coderabbitai

coderabbitai Bot commented Jun 26, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 4933f90b-a904-4014-8750-446b315c3a0c

📥 Commits

Reviewing files that changed from the base of the PR and between 269ac17 and e431f7a.

📒 Files selected for processing (1)
  • backend/libs/scanning/src/test/java/dev/cleat/scanning/WorkflowParserTest.java
🚧 Files skipped from review as they are similar to previous changes (1)
  • backend/libs/scanning/src/test/java/dev/cleat/scanning/WorkflowParserTest.java

📝 Walkthrough

Walkthrough

Adds workflow scan storage, parsing, scheduled execution, and an API endpoint to read stored scans for a repository. It also updates persistence mappings, migrations, worker wiring, and test configuration to support the new flow.

Changes

Workflow scan feature

Layer / File(s) Summary
Persistence model and schema
backend/libs/persistence/src/main/java/dev/cleat/persistence/entity/AccountEntity.java, RepoEntity.java, WorkflowScanEntity.java, backend/libs/persistence/src/main/java/dev/cleat/persistence/repository/WorkflowScanRepository.java, backend/libs/persistence/src/main/resources/db/migration/V1__...sql, V2__...sql, V3__...sql, backend/libs/persistence/src/test/java/dev/cleat/persistence/WorkflowScanRepositoryTest.java
Adds installationId to AccountEntity, moves RepoEntity to the entity subpackage, introduces WorkflowScanEntity mapped to workflow_scan, adds WorkflowScanRepository with find/delete methods, creates the table migration, and verifies both repository behaviors with Testcontainers integration tests.
Workflow YAML parsing and scan service
backend/libs/scanning/build.gradle.kts, backend/libs/scanning/src/main/java/dev/cleat/scanning/WorkflowAnalysis.java, WorkflowParser.java, WorkflowScanService.java, backend/libs/scanning/src/test/java/dev/cleat/scanning/WorkflowParserTest.java
Adds SnakeYAML, defines the WorkflowAnalysis record, implements WorkflowParser with pinned-SHA detection and risk scoring, implements WorkflowScanService to fetch, parse, and persist scan rows, and covers parser behaviors with unit tests.
Worker app wiring and scheduling
backend/apps/worker/src/main/java/dev/cleat/worker/CleatWorkerApplication.java, WorkflowScanJob.java, backend/apps/worker/src/main/resources/application.yml, backend/apps/worker/src/test/java/dev/cleat/worker/AbstractIntegrationTest.java, backend/apps/worker/src/test/resources/application.yml, backend/libs/github-client/src/main/java/dev/cleat/githubclient/config/GitHubConfig.java
Adds explicit entity, repository, and component scanning to the worker app, introduces the scheduled WorkflowScanJob, sets cleat.workflow-scan.interval-ms, marks the RedisTemplate bean @Primary, and aligns worker test configuration.
API read endpoint and integration tests
backend/apps/api/src/main/java/dev/cleat/api/controller/WorkflowScanController.java, backend/apps/api/src/test/java/dev/cleat/api/AbstractIntegrationTest.java, WorkflowScanControllerTest.java, backend/apps/api/src/test/resources/application.yml
Adds GET /api/repos/{repoId}/workflow-scans returning a list of WorkflowScanResponse records, switches the test base class to RANDOM_PORT, configures in-memory test security credentials, and covers both populated and empty cases with integration tests.

Sequence Diagram(s)

sequenceDiagram
  participant WorkflowScanJob
  participant AccountRepository
  participant WorkflowScanService
  participant GitHubClient
  participant WorkflowParser
  participant WorkflowScanRepository

  WorkflowScanJob->>AccountRepository: findAll()
  loop each account repo
    WorkflowScanJob->>WorkflowScanService: scanRepo(repo, installationId)
    WorkflowScanService->>GitHubClient: fetch .github/workflows listing
    WorkflowScanService->>WorkflowScanRepository: deleteByRepoId(repoId)
    loop each .yml/.yaml file
      WorkflowScanService->>GitHubClient: download raw YAML
      WorkflowScanService->>WorkflowParser: parse(workflowPath, yamlContent)
      WorkflowScanService->>WorkflowScanRepository: save(WorkflowScanEntity)
    end
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

Suggested reviewers

  • martian56

Poem

🐇 I hop through workflows, quick and bright,
Parsing YAML by moonlit light.
Stored scans pile up, neat and true,
One hop, one score, one API view.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: auditing GitHub Actions workflows across repositories.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/10-github-actions-workflow-audit

Comment @coderabbitai help to get the list of available commands.

@nakberli841-bot nakberli841-bot self-assigned this Jun 26, 2026
@nakberli841-bot nakberli841-bot marked this pull request as draft June 26, 2026 19:39
@nakberli841-bot nakberli841-bot added the area: backend Backend service or library label Jun 26, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Nitpick comments (3)
backend/libs/persistence/src/main/resources/db/migration/V2__create_workflow_scan_table.sql (1)

17-17: 🚀 Performance & Scalability | 🔵 Trivial | ⚡ Quick win

Consider a composite index aligned with the read query.

The primary read path findByRepoIdOrderByScannedAtDesc filters on repo_id and orders by scanned_at DESC. A composite (repo_id, scanned_at DESC) index lets Postgres satisfy both the filter and the sort without a separate sort step, which the current repo_id-only index does not.

⚡ Suggested index
-CREATE INDEX idx_workflow_scan_repo_id ON workflow_scan(repo_id);
+CREATE INDEX idx_workflow_scan_repo_id_scanned_at
+    ON workflow_scan(repo_id, scanned_at DESC);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@backend/libs/persistence/src/main/resources/db/migration/V2__create_workflow_scan_table.sql`
at line 17, The current workflow_scan index only covers repo_id, but the main
read path in the repository query findByRepoIdOrderByScannedAtDesc also sorts by
scanned_at DESC. Update the migration to use a composite index on workflow_scan
that matches the filter and ordering, referencing the existing
idx_workflow_scan_repo_id definition so Postgres can satisfy both without an
extra sort.
backend/libs/persistence/src/test/java/dev/cleat/persistence/WorkflowScanRepositoryTest.java (1)

54-82: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick win

Ordering is not actually asserted.

savesAndFindsWorkflowScanByRepoId persists a single scan, so findByRepoIdOrderByScannedAtDesc would pass regardless of sort direction. Consider inserting at least two scans with distinct scannedAt values and asserting the newest comes first to cover the OrderByScannedAtDesc contract that the API endpoint relies on.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@backend/libs/persistence/src/test/java/dev/cleat/persistence/WorkflowScanRepositoryTest.java`
around lines 54 - 82, The test in WorkflowScanRepositoryTest for
savesAndFindsWorkflowScanByRepoId does not actually verify the
OrderByScannedAtDesc behavior because it only persists one WorkflowScanEntity.
Update this test to save at least two scans for the same RepoEntity with
different scannedAt values, then assert that
workflowScanRepository.findByRepoIdOrderByScannedAtDesc returns the newest
WorkflowScanEntity first. Keep the existing repository method name and add
assertions on the first and second results to cover the intended ordering
contract.
backend/libs/persistence/src/main/java/dev/cleat/persistence/repo/WorkflowScanRepository.java (1)

12-12: 🩺 Stability & Availability | 🔵 Trivial

Consider a bulk delete here deleteByRepoId loads matching WorkflowScanEntity rows and removes them one by one. A custom @Modifying @query("delete ...") would avoid the extra reads if this repo can grow large.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@backend/libs/persistence/src/main/java/dev/cleat/persistence/repo/WorkflowScanRepository.java`
at line 12, The deleteByRepoId method in WorkflowScanRepository currently relies
on entity-by-entity removal, which can cause unnecessary reads for large
repositories. Update this repository method to use a bulk delete approach with a
custom `@Modifying` query so matching WorkflowScanEntity rows are deleted directly
by repoId. Keep the method name as the entry point, but change the repository
contract/annotations so the delete executes as a single bulk operation rather
than loading entities first.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@backend/apps/api/src/main/java/dev/cleat/api/CleatApiApplication.java`:
- Around line 7-8: The API application only enables repository scanning via
CleatApiApplication, so JPA entities under dev.cleat.persistence.entity are not
being discovered. Add entity scanning alongside the existing
`@EnableJpaRepositories` on CleatApiApplication, using `@EntityScan` with the
dev.cleat.persistence base package (matching CleatWorkerApplication) or
equivalent JPA entity scan configuration.

In
`@backend/apps/worker/src/main/java/dev/cleat/worker/CleatWorkerApplication.java`:
- Around line 12-14: The worker application scan setup is missing the
github-client module, so `WorkflowScanService` cannot resolve
`dev.cleat.githubclient.service.GitHubClient` at startup. Update
`CleatWorkerApplication` so its component scanning includes the
`dev.cleat.githubclient` package (or imports the github-client configuration
explicitly), alongside the existing `dev.cleat.persistence` and
`dev.cleat.scanning` setup, to ensure `WorkflowScanJob` can wire its
dependencies.

In `@backend/apps/worker/src/main/java/dev/cleat/worker/WorkflowScanJob.java`:
- Around line 25-38: `WorkflowScanJob.run()` is iterating detached
`AccountEntity`/`RepoEntity` instances, so the lazy `account.getRepos()` and
later `repo.getAccount().getLogin()` access can fail outside an active
persistence context. Fix this by loading the account/repo graph eagerly in
`accountRepository.findAll()` (for example with a `JOIN FETCH` or
`@EntityGraph`) or by moving the entire scan loop in `run()` into a
transactional boundary so `workflowScanService.scanRepo(...)` receives managed
entities.

In `@backend/libs/scanning/src/main/java/dev/cleat/scanning/WorkflowParser.java`:
- Around line 36-37: The permissions analysis in WorkflowParser has two coverage
gaps: job-level broad permission presets are not detected, and top-level scalar
write-all/read-all is treated as missing OIDC. Update the logic around the
permissions parsing and broadPermissions/missingOidc computation so both the
top-level permissions block and each job’s permissions are checked for
write-all/read-all, and treat write-all as implying id-token: write when setting
missingOidc. Use the existing WorkflowParser permissions handling and the
job-level inspection code to centralize this detection so the same rules apply
consistently.
- Around line 20-21: The YAML parsing in WorkflowParser currently uses the
default Yaml constructor and load path, which is not safe for untrusted workflow
files. Update WorkflowParser’s YAML creation to use SnakeYAML’s safe constructor
or safe loader configuration, and keep the parsing logic in the same place where
yaml.load(yamlContent) is called so external repository content is handled with
the restricted type set.

In
`@backend/libs/scanning/src/main/java/dev/cleat/scanning/WorkflowScanService.java`:
- Around line 42-48: In WorkflowScanService, stale scan rows are left behind
because the empty/null files early return happens before
workflowScanRepository.deleteByRepoId. Move the deleteByRepoId(repo.getId())
call so it runs before the files empty check in the scan workflow path, ensuring
old scans are cleared even when no workflows remain; keep the existing logic in
findByRepoIdOrderByScannedAtDesc unaffected.
- Line 60: Handle absolute download_url values separately in
WorkflowScanService: GitHubClient.get currently treats every input as a path on
api.github.com, so the contents API download_url is being built incorrectly.
Update the call site around the yamlContent fetch to use a request path/URI flow
that preserves absolute URLs, or add a separate GitHubClient method for absolute
URLs and use that here while keeping the bearer token attachment intact.

In
`@backend/libs/scanning/src/test/java/dev/cleat/scanning/WorkflowParserTest.java`:
- Around line 77-82: The test name in WorkflowParserTest is misleading because
emptyYamlReturnsZeroScore asserts a risk score of 20, not 0. Rename the test
method to reflect the actual behavior, such as emptyYamlScoresOnlyMissingOidc,
and keep the existing assertions in parser.parse and the WorkflowAnalysis
riskScore check unchanged.

---

Nitpick comments:
In
`@backend/libs/persistence/src/main/java/dev/cleat/persistence/repo/WorkflowScanRepository.java`:
- Line 12: The deleteByRepoId method in WorkflowScanRepository currently relies
on entity-by-entity removal, which can cause unnecessary reads for large
repositories. Update this repository method to use a bulk delete approach with a
custom `@Modifying` query so matching WorkflowScanEntity rows are deleted directly
by repoId. Keep the method name as the entry point, but change the repository
contract/annotations so the delete executes as a single bulk operation rather
than loading entities first.

In
`@backend/libs/persistence/src/main/resources/db/migration/V2__create_workflow_scan_table.sql`:
- Line 17: The current workflow_scan index only covers repo_id, but the main
read path in the repository query findByRepoIdOrderByScannedAtDesc also sorts by
scanned_at DESC. Update the migration to use a composite index on workflow_scan
that matches the filter and ordering, referencing the existing
idx_workflow_scan_repo_id definition so Postgres can satisfy both without an
extra sort.

In
`@backend/libs/persistence/src/test/java/dev/cleat/persistence/WorkflowScanRepositoryTest.java`:
- Around line 54-82: The test in WorkflowScanRepositoryTest for
savesAndFindsWorkflowScanByRepoId does not actually verify the
OrderByScannedAtDesc behavior because it only persists one WorkflowScanEntity.
Update this test to save at least two scans for the same RepoEntity with
different scannedAt values, then assert that
workflowScanRepository.findByRepoIdOrderByScannedAtDesc returns the newest
WorkflowScanEntity first. Keep the existing repository method name and add
assertions on the first and second results to cover the intended ordering
contract.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 915914f2-ad1f-4a0b-8769-0468385c27fa

📥 Commits

Reviewing files that changed from the base of the PR and between 066f931 and db5a09b.

📒 Files selected for processing (24)
  • backend/apps/api/src/main/java/dev/cleat/api/CleatApiApplication.java
  • backend/apps/api/src/main/java/dev/cleat/api/controller/WorkflowScanController.java
  • backend/apps/api/src/test/java/dev/cleat/api/AbstractIntegrationTest.java
  • backend/apps/api/src/test/java/dev/cleat/api/WorkflowScanControllerTest.java
  • backend/apps/worker/src/main/java/dev/cleat/worker/CleatWorkerApplication.java
  • backend/apps/worker/src/main/java/dev/cleat/worker/WorkflowScanJob.java
  • backend/apps/worker/src/main/resources/application.yml
  • backend/libs/persistence/src/main/java/dev/cleat/persistence/entity/AccountEntity.java
  • backend/libs/persistence/src/main/java/dev/cleat/persistence/entity/RepoEntity.java
  • backend/libs/persistence/src/main/java/dev/cleat/persistence/entity/WorkflowScanEntity.java
  • backend/libs/persistence/src/main/java/dev/cleat/persistence/enums/AccountType.java
  • backend/libs/persistence/src/main/java/dev/cleat/persistence/enums/Plan.java
  • backend/libs/persistence/src/main/java/dev/cleat/persistence/enums/Visibility.java
  • backend/libs/persistence/src/main/java/dev/cleat/persistence/repo/AccountRepository.java
  • backend/libs/persistence/src/main/java/dev/cleat/persistence/repo/RepoRepository.java
  • backend/libs/persistence/src/main/java/dev/cleat/persistence/repo/WorkflowScanRepository.java
  • backend/libs/persistence/src/main/resources/db/migration/V2__create_workflow_scan_table.sql
  • backend/libs/persistence/src/test/java/dev/cleat/persistence/AccountRepositoryTest.java
  • backend/libs/persistence/src/test/java/dev/cleat/persistence/WorkflowScanRepositoryTest.java
  • backend/libs/scanning/build.gradle.kts
  • backend/libs/scanning/src/main/java/dev/cleat/scanning/WorkflowAnalysis.java
  • backend/libs/scanning/src/main/java/dev/cleat/scanning/WorkflowParser.java
  • backend/libs/scanning/src/main/java/dev/cleat/scanning/WorkflowScanService.java
  • backend/libs/scanning/src/test/java/dev/cleat/scanning/WorkflowParserTest.java

Comment thread backend/apps/api/src/main/java/dev/cleat/api/CleatApiApplication.java Outdated
Comment thread backend/apps/worker/src/main/java/dev/cleat/worker/CleatWorkerApplication.java Outdated
Comment on lines +20 to +21
Yaml yaml = new Yaml();
Map<String, Object> workflow = yaml.load(yamlContent);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔒 Security & Privacy | 🟠 Major

🧩 Analysis chain

🌐 Web query:

What is the default tag/type instantiation behavior of SnakeYAML 2.x new Yaml().load and is it safe for untrusted input?

💡 Result:

In SnakeYAML 2.x, the default behavior of new Yaml().load() is "secure by default," which represents a significant change from 1.x versions [1][2][3]. By default, global tags (which allow the instantiation of arbitrary Java classes) are disabled [2][4][3]. Key technical aspects include: 1. Default Constructor: In 2.x, new Yaml() uses a constructor that restricts class instantiation [1]. It defaults to using an UnTrustedTagInspector that rejects all global tags, preventing the arbitrary class instantiation that previously led to remote code execution (RCE) vulnerabilities [4][3]. 2. Safety for Untrusted Input: Because global tags are disabled by default, new Yaml().load() is considered safe for parsing untrusted input insofar as it will not instantiate arbitrary, potentially malicious Java objects defined by those tags [2][3]. 3. Allowed Types: The default behavior allows only standard, safe YAML types (e.g., scalars, sequences, and mappings), which are handled by SafeConstructor [5][6][1]. 4. Security Best Practices: While the default configuration is now secure against common tag-based injection, developers should still exercise caution with untrusted input [7][3]. This includes limiting the size of the input to prevent denial-of-service (DoS) attacks via memory exhaustion [7][3]. If specific custom classes must be deserialized, developers should explicitly configure a TagInspector to whitelist only trusted packages or classes rather than globally enabling all tags [4][3]. In summary, new Yaml().load() in SnakeYAML 2.x is designed to prevent the deserialization vulnerabilities associated with earlier versions by default, making it safe to use for untrusted input provided that no additional, insecure configuration is explicitly enabled [1][2][3].

Citations:


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== WorkflowParser =="
sed -n '1,220p' backend/libs/scanning/src/main/java/dev/cleat/scanning/WorkflowParser.java

echo
echo "== Relevant build files =="
git ls-files 'backend/**/build.gradle*' 'backend/**/settings.gradle*' 'backend/**/gradle.properties' 'backend/**/libs/**/build.gradle*' | sed -n '1,200p'

echo
echo "== Search for snakeyaml / spring boot versions =="
rg -n --hidden --glob '!**/build/**' --glob '!**/target/**' --glob '!**/.gradle/**' 'snakeyaml|spring-boot|Boot BOM|org.yaml' backend

Repository: Devlaner/cleat

Length of output: 6546


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Root backend build files =="
git ls-files 'backend/build.gradle*' 'backend/settings.gradle*' 'backend/gradle.properties' 'backend/gradle/libs.versions.toml' | sed -n '1,200p'

echo
for f in backend/build.gradle.kts backend/gradle/libs.versions.toml backend/settings.gradle.kts backend/gradle.properties; do
  if [ -f "$f" ]; then
    echo "== $f =="
    sed -n '1,220p' "$f"
    echo
  fi
done

Repository: Devlaner/cleat

Length of output: 2151


🌐 Web query:

Spring Boot 3.5.3 managed SnakeYAML version

💡 Result:

Spring Boot 3.5.3 manages the SnakeYAML dependency version as 1.33 [1]. Consistent with Spring Boot's policy of maintaining stability within minor versions, Spring Boot 3.5.x continues to utilize the 1.33 release of SnakeYAML, rather than upgrading to the 2.x line by default [2][3][1]. If your application requires a different version of SnakeYAML (for example, to use features from the 2.x release), you can override the managed version using the snakeyaml.version property in your build configuration [4][1]. For Maven projects, you can add this to your pom.xml: <snakeyaml.version>2.x</snakeyaml.version> For Gradle projects, you can add this to your build.gradle: ext['snakeyaml.version'] = '2.x' Spring Boot 3.5.3 was released on June 20, 2025 [5][6]. While newer versions of SnakeYAML exist, Spring Boot manages this dependency to ensure compatibility with its internal YAML processing, and it is recommended to override this property only if you have a specific requirement to use a different version [2][1].

Citations:


🌐 Web query:

Spring Boot 3.5.3 managed SnakeYAML version

💡 Result:

Spring Boot 3.5.3 manages SnakeYAML at version 1.33 [1]. While Spring Boot 3.5.x versions are compatible with SnakeYAML 2.x, they continue to default to the 1.33 version to ensure stability and backward compatibility [1][2]. If you require features from SnakeYAML 2.x, you can override the managed version by defining the snakeyaml.version property in your project's build configuration [3][1]. For Maven projects, add the following to your pom.xml: <snakeyaml.version>2.x</snakeyaml.version> For Gradle projects, add this to your build.gradle: ext['snakeyaml.version'] = '2.x' This will instruct Spring Boot to use your specified version instead of the default 1.33 [3][1].

Citations:


Use SnakeYAML’s safe constructor here.

Spring Boot 3.5.3 still manages SnakeYAML 1.33, so new Yaml().load(...) here is not safe for untrusted workflow files from external repositories. Switch to a safe loader / SafeConstructor instead.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/libs/scanning/src/main/java/dev/cleat/scanning/WorkflowParser.java`
around lines 20 - 21, The YAML parsing in WorkflowParser currently uses the
default Yaml constructor and load path, which is not safe for untrusted workflow
files. Update WorkflowParser’s YAML creation to use SnakeYAML’s safe constructor
or safe loader configuration, and keep the parsing logic in the same place where
yaml.load(yamlContent) is called so external repository content is handled with
the restricted type set.

Comment thread backend/libs/scanning/src/main/java/dev/cleat/scanning/WorkflowParser.java Outdated
Comment thread backend/libs/scanning/src/main/java/dev/cleat/scanning/WorkflowScanService.java Outdated
Comment thread backend/libs/scanning/src/main/java/dev/cleat/scanning/WorkflowScanService.java Outdated
Comment thread backend/libs/scanning/src/test/java/dev/cleat/scanning/WorkflowParserTest.java Outdated
- Added installationId field (String) to AccountEntity
- Updated V1 migration script to include installation_id column
- Fixed compilation and consistency issues across modules
- Fix JUnit Platform launcher dependency in :libs:scanning
- Resolve NoUniqueBeanDefinitionException in :apps:worker test context
- Update API integration tests to match updated AccountEntity structure
…UniqueBeanDefinitionException
@nakberli841-bot nakberli841-bot marked this pull request as ready for review June 28, 2026 16:00

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@backend/apps/worker/src/test/resources/application.yml`:
- Around line 5-7: Disable scheduling in the worker test configuration so
`WorkflowScanJob` does not run during test startup. Update the test
`application.yml` used by `CleatWorkerApplication` to set
`spring.task.scheduling.enabled` to false while keeping the existing
`cleat.workflow-scan.interval-ms` settings as-is. This change should be made in
the test resource config that defines the worker application properties.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 0e6f105d-544c-43f1-9716-bfe4d2ac22e6

📥 Commits

Reviewing files that changed from the base of the PR and between db5a09b and 035e1e2.

📒 Files selected for processing (18)
  • backend/apps/api/src/main/java/dev/cleat/api/controller/WorkflowScanController.java
  • backend/apps/api/src/test/java/dev/cleat/api/WorkflowScanControllerTest.java
  • backend/apps/api/src/test/resources/application.yml
  • backend/apps/worker/src/main/java/dev/cleat/worker/CleatWorkerApplication.java
  • backend/apps/worker/src/main/java/dev/cleat/worker/WorkflowScanJob.java
  • backend/apps/worker/src/test/java/dev/cleat/worker/AbstractIntegrationTest.java
  • backend/apps/worker/src/test/resources/application.yml
  • backend/libs/github-client/src/main/java/dev/cleat/githubclient/config/GitHubConfig.java
  • backend/libs/persistence/src/main/java/dev/cleat/persistence/entity/AccountEntity.java
  • backend/libs/persistence/src/main/java/dev/cleat/persistence/entity/RepoEntity.java
  • backend/libs/persistence/src/main/java/dev/cleat/persistence/repository/WorkflowScanRepository.java
  • backend/libs/persistence/src/main/resources/db/migration/V1__create_account_and_repo_tables.sql
  • backend/libs/persistence/src/main/resources/db/migration/V3__update_repo_and_add_tables.sql
  • backend/libs/persistence/src/test/java/dev/cleat/persistence/WorkflowScanRepositoryTest.java
  • backend/libs/scanning/build.gradle.kts
  • backend/libs/scanning/src/main/java/dev/cleat/scanning/WorkflowParser.java
  • backend/libs/scanning/src/main/java/dev/cleat/scanning/WorkflowScanService.java
  • backend/libs/scanning/src/test/java/dev/cleat/scanning/WorkflowParserTest.java
💤 Files with no reviewable changes (1)
  • backend/apps/worker/src/test/java/dev/cleat/worker/AbstractIntegrationTest.java
✅ Files skipped from review due to trivial changes (2)
  • backend/libs/persistence/src/main/java/dev/cleat/persistence/repository/WorkflowScanRepository.java
  • backend/libs/persistence/src/main/resources/db/migration/V3__update_repo_and_add_tables.sql
🚧 Files skipped from review as they are similar to previous changes (10)
  • backend/apps/api/src/main/java/dev/cleat/api/controller/WorkflowScanController.java
  • backend/apps/api/src/test/java/dev/cleat/api/WorkflowScanControllerTest.java
  • backend/libs/scanning/build.gradle.kts
  • backend/libs/persistence/src/test/java/dev/cleat/persistence/WorkflowScanRepositoryTest.java
  • backend/apps/worker/src/main/java/dev/cleat/worker/CleatWorkerApplication.java
  • backend/apps/worker/src/main/java/dev/cleat/worker/WorkflowScanJob.java
  • backend/libs/scanning/src/main/java/dev/cleat/scanning/WorkflowScanService.java
  • backend/libs/scanning/src/main/java/dev/cleat/scanning/WorkflowParser.java
  • backend/libs/scanning/src/test/java/dev/cleat/scanning/WorkflowParserTest.java
  • backend/libs/persistence/src/main/java/dev/cleat/persistence/entity/RepoEntity.java

Comment on lines +5 to +7
cleat:
workflow-scan:
interval-ms: 3600000 No newline at end of file

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

fd 'CleatWorkerApplication\.java$' backend/apps/worker/src/main/java -x sh -c '
  echo "== $1 ==";
  ast-grep outline "$1";
  sed -n "1,160p" "$1"
' sh {}

fd 'AbstractIntegrationTest\.java$' backend/apps/worker/src/test/java -x sh -c '
  echo "== $1 ==";
  ast-grep outline "$1";
  sed -n "1,220p" "$1"
' sh {}

rg -n -C2 '`@EnableScheduling`|`@Scheduled`|spring\.task\.scheduling\.enabled|workflow-scan\.interval-ms' backend/apps/worker

Repository: Devlaner/cleat

Length of output: 3224


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== test resources =="
git ls-files 'backend/apps/worker/src/test/resources/**' | sort

echo
echo "== scheduling-related config in worker tests =="
rg -n -C2 'spring\.task\.scheduling\.enabled|EnableScheduling|`@Scheduled`|workflow-scan\.interval-ms' \
  backend/apps/worker/src/test backend/apps/worker/src/main

echo
echo "== worker test classes =="
fd -e java backend/apps/worker/src/test/java -x sh -c 'echo "--- $1 ---"; sed -n "1,220p" "$1"' sh {}

Repository: Devlaner/cleat

Length of output: 2311


Disable the scheduler in worker tests
CleatWorkerApplication enables scheduling globally, and this test config leaves cleat.workflow-scan.interval-ms active, so WorkflowScanJob can run when the test context starts. Add spring.task.scheduling.enabled: false here to keep integration tests deterministic.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/apps/worker/src/test/resources/application.yml` around lines 5 - 7,
Disable scheduling in the worker test configuration so `WorkflowScanJob` does
not run during test startup. Update the test `application.yml` used by
`CleatWorkerApplication` to set `spring.task.scheduling.enabled` to false while
keeping the existing `cleat.workflow-scan.interval-ms` settings as-is. This
change should be made in the test resource config that defines the worker
application properties.

@martian56 martian56 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through the whole diff and every CodeRabbit thread, and checked each one against the current code rather than trusting the resolved flags. Short version: the valid CodeRabbit points are still open. Nine threads, two resolved, seven still active, and six of those seven are real and unaddressed. Two of them are not minor, they are bugs that only surface when the scheduled job actually runs, which is exactly why CI stays green.

The foundation is good. The parser structure, the risk scoring, the entity and migration, the endpoint, and the parser unit tests are all reasonable. But the end to end scan path does not work yet, and that is the part the tests do not exercise.

The two that block:

  1. The scheduled job will throw on a lazy load (WorkflowScanJob, inline). run() is not transactional, so accountRepository.findAll() hands back detached entities, and account.getRepos() then trips a LazyInitializationException before scanRepo is ever reached. repo.getAccount().getLogin() inside scanRepo has the same problem since the repo arrives detached. The parser tests and the controller test never go through this path, so it passes CI and fails the moment the hourly job fires. CodeRabbit flagged this as critical and it is right. Easiest fixes: make run() @transactional, or load accounts with their repos via @EntityGraph or a JOIN FETCH query, or pass repo ids into a transactional service method and reload inside.

  2. The workflow content fetch hits the wrong host (WorkflowScanService, inline). download_url from the contents API is an absolute raw.githubusercontent.com URL, but GitHubClient.get builds the request with .uri(builder -> builder.path(uri).build()) against the https://api.github.com base, so the absolute URL gets treated as a path under api.github.com. So even once the lazy load is fixed, no YAML actually comes back. Either fetch the content from the contents API response directly (it returns a base64 content field you can decode, no second call needed), or give GitHubClient a way to GET an absolute URL with a plain WebClient. Also worth noting these two together mean the feature has not run successfully end to end yet, so please verify against the compose stack with a seeded account before the next round.

The valid smaller ones, also still open:

  1. Permission detection has gaps (WorkflowParser, inline). A top-level scalar permissions: write-all sets broadPermissions but leaves missingOidc true, so it gets both the broad penalty and a false missing-OIDC penalty even though write-all includes id-token: write. Job-level write-all / read-all is never counted as broad either. While you are in there, the map heuristic on line 36 (anyMatch value == write and size > 3) is arbitrary and will miss real broad grants, that is worth rethinking, not just patching.

  2. Stale scans are never cleared when a repo loses all its workflows. The empty-files early return happens before deleteByRepoId, so old rows linger and keep showing up in the latest-scan query. Move the delete above the empty check.

  3. WorkflowParserTest.emptyYamlReturnsZeroScore asserts a score of 20, not zero. Rename it so the name matches what it checks, something like emptyYamlScoresOnlyMissingOidc.

  4. The worker test config leaves scheduling on, so WorkflowScanJob can fire during the integration test context. Add spring.task.scheduling.enabled: false to the worker test application.yml to keep tests deterministic.

The one I would push back on:

  1. CodeRabbit calls new Yaml().load() a major security hole on the grounds that Spring Boot 3.5.3 ships SnakeYAML 1.33. That premise is wrong. Boot 3.5.3 manages SnakeYAML 2.x, and 2.x is safe by default: the default loader rejects global tags, so arbitrary class instantiation from untrusted YAML is already blocked. So this is not a blocker and not for the reason given. I would still do two small things as plain hygiene: construct with a SafeConstructor so the intent is explicit and you are protected if the version ever moves, and wrap load() so a malformed workflow file throws something you handle rather than bubbling up. Neither is required to merge.

So to answer the actual question: no, the valid CodeRabbit comments are not fixed. Items 1 through 4, 6 and 7 all still stand in the code. Get the two runtime bugs sorted and the four cleanups in, and this is close. Happy to look again right after.

Comment thread backend/libs/scanning/src/main/java/dev/cleat/scanning/WorkflowScanService.java Outdated
Comment thread backend/libs/scanning/src/main/java/dev/cleat/scanning/WorkflowParser.java Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: backend Backend service or library

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants