feat(scanning): audit GitHub Actions workflows across repositories#33
feat(scanning): audit GitHub Actions workflows across repositories#33nakberli841-bot wants to merge 22 commits into
Conversation
- Add V2 Flyway migration for workflow_scan table with FK to repo
- Add WorkflowScanEntity and WorkflowScanRepository
- Add RepoRepository
- Implement WorkflowParser to flag unpinned actions, broad permissions
and missing OIDC with 0-100 risk scoring
- Implement WorkflowScanService to fetch workflow YAMLs from GitHub
and persist scan results
- Add WorkflowScanJob scheduled worker to scan all repos hourly
- Expose GET /api/repos/{repoId}/workflow-scans endpoint
- Add unit tests for WorkflowParser
- Add integration tests for WorkflowScanRepository and WorkflowScanController
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughAdds workflow scan storage, parsing, scheduled execution, and an API endpoint to read stored scans for a repository. It also updates persistence mappings, migrations, worker wiring, and test configuration to support the new flow. ChangesWorkflow scan feature
Sequence Diagram(s)sequenceDiagram
participant WorkflowScanJob
participant AccountRepository
participant WorkflowScanService
participant GitHubClient
participant WorkflowParser
participant WorkflowScanRepository
WorkflowScanJob->>AccountRepository: findAll()
loop each account repo
WorkflowScanJob->>WorkflowScanService: scanRepo(repo, installationId)
WorkflowScanService->>GitHubClient: fetch .github/workflows listing
WorkflowScanService->>WorkflowScanRepository: deleteByRepoId(repoId)
loop each .yml/.yaml file
WorkflowScanService->>GitHubClient: download raw YAML
WorkflowScanService->>WorkflowParser: parse(workflowPath, yamlContent)
WorkflowScanService->>WorkflowScanRepository: save(WorkflowScanEntity)
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related issues
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 8
🧹 Nitpick comments (3)
backend/libs/persistence/src/main/resources/db/migration/V2__create_workflow_scan_table.sql (1)
17-17: 🚀 Performance & Scalability | 🔵 Trivial | ⚡ Quick winConsider a composite index aligned with the read query.
The primary read path
findByRepoIdOrderByScannedAtDescfilters onrepo_idand orders byscanned_at DESC. A composite(repo_id, scanned_at DESC)index lets Postgres satisfy both the filter and the sort without a separate sort step, which the currentrepo_id-only index does not.⚡ Suggested index
-CREATE INDEX idx_workflow_scan_repo_id ON workflow_scan(repo_id); +CREATE INDEX idx_workflow_scan_repo_id_scanned_at + ON workflow_scan(repo_id, scanned_at DESC);🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/libs/persistence/src/main/resources/db/migration/V2__create_workflow_scan_table.sql` at line 17, The current workflow_scan index only covers repo_id, but the main read path in the repository query findByRepoIdOrderByScannedAtDesc also sorts by scanned_at DESC. Update the migration to use a composite index on workflow_scan that matches the filter and ordering, referencing the existing idx_workflow_scan_repo_id definition so Postgres can satisfy both without an extra sort.backend/libs/persistence/src/test/java/dev/cleat/persistence/WorkflowScanRepositoryTest.java (1)
54-82: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick winOrdering is not actually asserted.
savesAndFindsWorkflowScanByRepoIdpersists a single scan, sofindByRepoIdOrderByScannedAtDescwould pass regardless of sort direction. Consider inserting at least two scans with distinctscannedAtvalues and asserting the newest comes first to cover theOrderByScannedAtDesccontract that the API endpoint relies on.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/libs/persistence/src/test/java/dev/cleat/persistence/WorkflowScanRepositoryTest.java` around lines 54 - 82, The test in WorkflowScanRepositoryTest for savesAndFindsWorkflowScanByRepoId does not actually verify the OrderByScannedAtDesc behavior because it only persists one WorkflowScanEntity. Update this test to save at least two scans for the same RepoEntity with different scannedAt values, then assert that workflowScanRepository.findByRepoIdOrderByScannedAtDesc returns the newest WorkflowScanEntity first. Keep the existing repository method name and add assertions on the first and second results to cover the intended ordering contract.backend/libs/persistence/src/main/java/dev/cleat/persistence/repo/WorkflowScanRepository.java (1)
12-12: 🩺 Stability & Availability | 🔵 TrivialConsider a bulk delete here
deleteByRepoIdloads matchingWorkflowScanEntityrows and removes them one by one. A custom@Modifying@query("delete ...")would avoid the extra reads if this repo can grow large.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/libs/persistence/src/main/java/dev/cleat/persistence/repo/WorkflowScanRepository.java` at line 12, The deleteByRepoId method in WorkflowScanRepository currently relies on entity-by-entity removal, which can cause unnecessary reads for large repositories. Update this repository method to use a bulk delete approach with a custom `@Modifying` query so matching WorkflowScanEntity rows are deleted directly by repoId. Keep the method name as the entry point, but change the repository contract/annotations so the delete executes as a single bulk operation rather than loading entities first.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@backend/apps/api/src/main/java/dev/cleat/api/CleatApiApplication.java`:
- Around line 7-8: The API application only enables repository scanning via
CleatApiApplication, so JPA entities under dev.cleat.persistence.entity are not
being discovered. Add entity scanning alongside the existing
`@EnableJpaRepositories` on CleatApiApplication, using `@EntityScan` with the
dev.cleat.persistence base package (matching CleatWorkerApplication) or
equivalent JPA entity scan configuration.
In
`@backend/apps/worker/src/main/java/dev/cleat/worker/CleatWorkerApplication.java`:
- Around line 12-14: The worker application scan setup is missing the
github-client module, so `WorkflowScanService` cannot resolve
`dev.cleat.githubclient.service.GitHubClient` at startup. Update
`CleatWorkerApplication` so its component scanning includes the
`dev.cleat.githubclient` package (or imports the github-client configuration
explicitly), alongside the existing `dev.cleat.persistence` and
`dev.cleat.scanning` setup, to ensure `WorkflowScanJob` can wire its
dependencies.
In `@backend/apps/worker/src/main/java/dev/cleat/worker/WorkflowScanJob.java`:
- Around line 25-38: `WorkflowScanJob.run()` is iterating detached
`AccountEntity`/`RepoEntity` instances, so the lazy `account.getRepos()` and
later `repo.getAccount().getLogin()` access can fail outside an active
persistence context. Fix this by loading the account/repo graph eagerly in
`accountRepository.findAll()` (for example with a `JOIN FETCH` or
`@EntityGraph`) or by moving the entire scan loop in `run()` into a
transactional boundary so `workflowScanService.scanRepo(...)` receives managed
entities.
In `@backend/libs/scanning/src/main/java/dev/cleat/scanning/WorkflowParser.java`:
- Around line 36-37: The permissions analysis in WorkflowParser has two coverage
gaps: job-level broad permission presets are not detected, and top-level scalar
write-all/read-all is treated as missing OIDC. Update the logic around the
permissions parsing and broadPermissions/missingOidc computation so both the
top-level permissions block and each job’s permissions are checked for
write-all/read-all, and treat write-all as implying id-token: write when setting
missingOidc. Use the existing WorkflowParser permissions handling and the
job-level inspection code to centralize this detection so the same rules apply
consistently.
- Around line 20-21: The YAML parsing in WorkflowParser currently uses the
default Yaml constructor and load path, which is not safe for untrusted workflow
files. Update WorkflowParser’s YAML creation to use SnakeYAML’s safe constructor
or safe loader configuration, and keep the parsing logic in the same place where
yaml.load(yamlContent) is called so external repository content is handled with
the restricted type set.
In
`@backend/libs/scanning/src/main/java/dev/cleat/scanning/WorkflowScanService.java`:
- Around line 42-48: In WorkflowScanService, stale scan rows are left behind
because the empty/null files early return happens before
workflowScanRepository.deleteByRepoId. Move the deleteByRepoId(repo.getId())
call so it runs before the files empty check in the scan workflow path, ensuring
old scans are cleared even when no workflows remain; keep the existing logic in
findByRepoIdOrderByScannedAtDesc unaffected.
- Line 60: Handle absolute download_url values separately in
WorkflowScanService: GitHubClient.get currently treats every input as a path on
api.github.com, so the contents API download_url is being built incorrectly.
Update the call site around the yamlContent fetch to use a request path/URI flow
that preserves absolute URLs, or add a separate GitHubClient method for absolute
URLs and use that here while keeping the bearer token attachment intact.
In
`@backend/libs/scanning/src/test/java/dev/cleat/scanning/WorkflowParserTest.java`:
- Around line 77-82: The test name in WorkflowParserTest is misleading because
emptyYamlReturnsZeroScore asserts a risk score of 20, not 0. Rename the test
method to reflect the actual behavior, such as emptyYamlScoresOnlyMissingOidc,
and keep the existing assertions in parser.parse and the WorkflowAnalysis
riskScore check unchanged.
---
Nitpick comments:
In
`@backend/libs/persistence/src/main/java/dev/cleat/persistence/repo/WorkflowScanRepository.java`:
- Line 12: The deleteByRepoId method in WorkflowScanRepository currently relies
on entity-by-entity removal, which can cause unnecessary reads for large
repositories. Update this repository method to use a bulk delete approach with a
custom `@Modifying` query so matching WorkflowScanEntity rows are deleted directly
by repoId. Keep the method name as the entry point, but change the repository
contract/annotations so the delete executes as a single bulk operation rather
than loading entities first.
In
`@backend/libs/persistence/src/main/resources/db/migration/V2__create_workflow_scan_table.sql`:
- Line 17: The current workflow_scan index only covers repo_id, but the main
read path in the repository query findByRepoIdOrderByScannedAtDesc also sorts by
scanned_at DESC. Update the migration to use a composite index on workflow_scan
that matches the filter and ordering, referencing the existing
idx_workflow_scan_repo_id definition so Postgres can satisfy both without an
extra sort.
In
`@backend/libs/persistence/src/test/java/dev/cleat/persistence/WorkflowScanRepositoryTest.java`:
- Around line 54-82: The test in WorkflowScanRepositoryTest for
savesAndFindsWorkflowScanByRepoId does not actually verify the
OrderByScannedAtDesc behavior because it only persists one WorkflowScanEntity.
Update this test to save at least two scans for the same RepoEntity with
different scannedAt values, then assert that
workflowScanRepository.findByRepoIdOrderByScannedAtDesc returns the newest
WorkflowScanEntity first. Keep the existing repository method name and add
assertions on the first and second results to cover the intended ordering
contract.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 915914f2-ad1f-4a0b-8769-0468385c27fa
📒 Files selected for processing (24)
backend/apps/api/src/main/java/dev/cleat/api/CleatApiApplication.javabackend/apps/api/src/main/java/dev/cleat/api/controller/WorkflowScanController.javabackend/apps/api/src/test/java/dev/cleat/api/AbstractIntegrationTest.javabackend/apps/api/src/test/java/dev/cleat/api/WorkflowScanControllerTest.javabackend/apps/worker/src/main/java/dev/cleat/worker/CleatWorkerApplication.javabackend/apps/worker/src/main/java/dev/cleat/worker/WorkflowScanJob.javabackend/apps/worker/src/main/resources/application.ymlbackend/libs/persistence/src/main/java/dev/cleat/persistence/entity/AccountEntity.javabackend/libs/persistence/src/main/java/dev/cleat/persistence/entity/RepoEntity.javabackend/libs/persistence/src/main/java/dev/cleat/persistence/entity/WorkflowScanEntity.javabackend/libs/persistence/src/main/java/dev/cleat/persistence/enums/AccountType.javabackend/libs/persistence/src/main/java/dev/cleat/persistence/enums/Plan.javabackend/libs/persistence/src/main/java/dev/cleat/persistence/enums/Visibility.javabackend/libs/persistence/src/main/java/dev/cleat/persistence/repo/AccountRepository.javabackend/libs/persistence/src/main/java/dev/cleat/persistence/repo/RepoRepository.javabackend/libs/persistence/src/main/java/dev/cleat/persistence/repo/WorkflowScanRepository.javabackend/libs/persistence/src/main/resources/db/migration/V2__create_workflow_scan_table.sqlbackend/libs/persistence/src/test/java/dev/cleat/persistence/AccountRepositoryTest.javabackend/libs/persistence/src/test/java/dev/cleat/persistence/WorkflowScanRepositoryTest.javabackend/libs/scanning/build.gradle.ktsbackend/libs/scanning/src/main/java/dev/cleat/scanning/WorkflowAnalysis.javabackend/libs/scanning/src/main/java/dev/cleat/scanning/WorkflowParser.javabackend/libs/scanning/src/main/java/dev/cleat/scanning/WorkflowScanService.javabackend/libs/scanning/src/test/java/dev/cleat/scanning/WorkflowParserTest.java
| Yaml yaml = new Yaml(); | ||
| Map<String, Object> workflow = yaml.load(yamlContent); |
There was a problem hiding this comment.
🔒 Security & Privacy | 🟠 Major
🧩 Analysis chain
🌐 Web query:
What is the default tag/type instantiation behavior of SnakeYAML 2.x new Yaml().load and is it safe for untrusted input?
💡 Result:
In SnakeYAML 2.x, the default behavior of new Yaml().load() is "secure by default," which represents a significant change from 1.x versions [1][2][3]. By default, global tags (which allow the instantiation of arbitrary Java classes) are disabled [2][4][3]. Key technical aspects include: 1. Default Constructor: In 2.x, new Yaml() uses a constructor that restricts class instantiation [1]. It defaults to using an UnTrustedTagInspector that rejects all global tags, preventing the arbitrary class instantiation that previously led to remote code execution (RCE) vulnerabilities [4][3]. 2. Safety for Untrusted Input: Because global tags are disabled by default, new Yaml().load() is considered safe for parsing untrusted input insofar as it will not instantiate arbitrary, potentially malicious Java objects defined by those tags [2][3]. 3. Allowed Types: The default behavior allows only standard, safe YAML types (e.g., scalars, sequences, and mappings), which are handled by SafeConstructor [5][6][1]. 4. Security Best Practices: While the default configuration is now secure against common tag-based injection, developers should still exercise caution with untrusted input [7][3]. This includes limiting the size of the input to prevent denial-of-service (DoS) attacks via memory exhaustion [7][3]. If specific custom classes must be deserialized, developers should explicitly configure a TagInspector to whitelist only trusted packages or classes rather than globally enabling all tags [4][3]. In summary, new Yaml().load() in SnakeYAML 2.x is designed to prevent the deserialization vulnerabilities associated with earlier versions by default, making it safe to use for untrusted input provided that no additional, insecure configuration is explicitly enabled [1][2][3].
Citations:
- 1: https://snyk.io/blog/snakeyaml-unsafe-deserialization-vulnerability/
- 2: https://docs.herodevs.com/snakeyaml/cve-2022-1471
- 3: https://deepwiki.com/snakeyaml/snakeyaml/7-security
- 4: https://deepwiki.com/snakeyaml/snakeyaml/7.3-type-safety-and-class-loading
- 5: https://deepwiki.com/snakeyaml/snakeyaml/3-loading-(deserialization)-pipeline
- 6: https://deepwiki.com/snakeyaml/snakeyaml/3.5-construction-and-object-creation
- 7: https://codeberg.org/snakeyaml/snakeyaml/wiki/CVE_N_NIST
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== WorkflowParser =="
sed -n '1,220p' backend/libs/scanning/src/main/java/dev/cleat/scanning/WorkflowParser.java
echo
echo "== Relevant build files =="
git ls-files 'backend/**/build.gradle*' 'backend/**/settings.gradle*' 'backend/**/gradle.properties' 'backend/**/libs/**/build.gradle*' | sed -n '1,200p'
echo
echo "== Search for snakeyaml / spring boot versions =="
rg -n --hidden --glob '!**/build/**' --glob '!**/target/**' --glob '!**/.gradle/**' 'snakeyaml|spring-boot|Boot BOM|org.yaml' backendRepository: Devlaner/cleat
Length of output: 6546
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Root backend build files =="
git ls-files 'backend/build.gradle*' 'backend/settings.gradle*' 'backend/gradle.properties' 'backend/gradle/libs.versions.toml' | sed -n '1,200p'
echo
for f in backend/build.gradle.kts backend/gradle/libs.versions.toml backend/settings.gradle.kts backend/gradle.properties; do
if [ -f "$f" ]; then
echo "== $f =="
sed -n '1,220p' "$f"
echo
fi
doneRepository: Devlaner/cleat
Length of output: 2151
🌐 Web query:
Spring Boot 3.5.3 managed SnakeYAML version
💡 Result:
Spring Boot 3.5.3 manages the SnakeYAML dependency version as 1.33 [1]. Consistent with Spring Boot's policy of maintaining stability within minor versions, Spring Boot 3.5.x continues to utilize the 1.33 release of SnakeYAML, rather than upgrading to the 2.x line by default [2][3][1]. If your application requires a different version of SnakeYAML (for example, to use features from the 2.x release), you can override the managed version using the snakeyaml.version property in your build configuration [4][1]. For Maven projects, you can add this to your pom.xml: <snakeyaml.version>2.x</snakeyaml.version> For Gradle projects, you can add this to your build.gradle: ext['snakeyaml.version'] = '2.x' Spring Boot 3.5.3 was released on June 20, 2025 [5][6]. While newer versions of SnakeYAML exist, Spring Boot manages this dependency to ensure compatibility with its internal YAML processing, and it is recommended to override this property only if you have a specific requirement to use a different version [2][1].
Citations:
- 1: https://www.tutorialpedia.org/blog/is-snakeyaml-2-0-added-in-the-new-spring-boot-versions/
- 2: CVE-2022-25857 - Upgrade to SnakeYAML 1.31 spring-projects/spring-boot#32221
- 3: Spring Boot 2.7.x: Upgrade org.yaml/snakeyaml from 1.30 to 1.33 spring-projects/spring-boot#35401
- 4: https://docs.spring.io/spring-boot/3.5/appendix/dependency-versions/properties.html
- 5: https://github.com/spring-projects/spring-boot/releases/tag/v3.5.3
- 6: https://spring.io/blog/2025/06/19/spring-boot-3-5-3-available-now
🌐 Web query:
Spring Boot 3.5.3 managed SnakeYAML version
💡 Result:
Spring Boot 3.5.3 manages SnakeYAML at version 1.33 [1]. While Spring Boot 3.5.x versions are compatible with SnakeYAML 2.x, they continue to default to the 1.33 version to ensure stability and backward compatibility [1][2]. If you require features from SnakeYAML 2.x, you can override the managed version by defining the snakeyaml.version property in your project's build configuration [3][1]. For Maven projects, add the following to your pom.xml: <snakeyaml.version>2.x</snakeyaml.version> For Gradle projects, add this to your build.gradle: ext['snakeyaml.version'] = '2.x' This will instruct Spring Boot to use your specified version instead of the default 1.33 [3][1].
Citations:
- 1: https://www.tutorialpedia.org/blog/is-snakeyaml-2-0-added-in-the-new-spring-boot-versions/
- 2: Snakeyaml 2.0 is not compatible with Spring boot 3.0.5 spring-projects/spring-boot#35064
- 3: CVE-2022-25857 - Upgrade to SnakeYAML 1.31 spring-projects/spring-boot#32221
Use SnakeYAML’s safe constructor here.
Spring Boot 3.5.3 still manages SnakeYAML 1.33, so new Yaml().load(...) here is not safe for untrusted workflow files from external repositories. Switch to a safe loader / SafeConstructor instead.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@backend/libs/scanning/src/main/java/dev/cleat/scanning/WorkflowParser.java`
around lines 20 - 21, The YAML parsing in WorkflowParser currently uses the
default Yaml constructor and load path, which is not safe for untrusted workflow
files. Update WorkflowParser’s YAML creation to use SnakeYAML’s safe constructor
or safe loader configuration, and keep the parsing logic in the same place where
yaml.load(yamlContent) is called so external repository content is handled with
the restricted type set.
- Added installationId field (String) to AccountEntity - Updated V1 migration script to include installation_id column - Fixed compilation and consistency issues across modules
- Fix JUnit Platform launcher dependency in :libs:scanning - Resolve NoUniqueBeanDefinitionException in :apps:worker test context - Update API integration tests to match updated AccountEntity structure
…UniqueBeanDefinitionException
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@backend/apps/worker/src/test/resources/application.yml`:
- Around line 5-7: Disable scheduling in the worker test configuration so
`WorkflowScanJob` does not run during test startup. Update the test
`application.yml` used by `CleatWorkerApplication` to set
`spring.task.scheduling.enabled` to false while keeping the existing
`cleat.workflow-scan.interval-ms` settings as-is. This change should be made in
the test resource config that defines the worker application properties.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 0e6f105d-544c-43f1-9716-bfe4d2ac22e6
📒 Files selected for processing (18)
backend/apps/api/src/main/java/dev/cleat/api/controller/WorkflowScanController.javabackend/apps/api/src/test/java/dev/cleat/api/WorkflowScanControllerTest.javabackend/apps/api/src/test/resources/application.ymlbackend/apps/worker/src/main/java/dev/cleat/worker/CleatWorkerApplication.javabackend/apps/worker/src/main/java/dev/cleat/worker/WorkflowScanJob.javabackend/apps/worker/src/test/java/dev/cleat/worker/AbstractIntegrationTest.javabackend/apps/worker/src/test/resources/application.ymlbackend/libs/github-client/src/main/java/dev/cleat/githubclient/config/GitHubConfig.javabackend/libs/persistence/src/main/java/dev/cleat/persistence/entity/AccountEntity.javabackend/libs/persistence/src/main/java/dev/cleat/persistence/entity/RepoEntity.javabackend/libs/persistence/src/main/java/dev/cleat/persistence/repository/WorkflowScanRepository.javabackend/libs/persistence/src/main/resources/db/migration/V1__create_account_and_repo_tables.sqlbackend/libs/persistence/src/main/resources/db/migration/V3__update_repo_and_add_tables.sqlbackend/libs/persistence/src/test/java/dev/cleat/persistence/WorkflowScanRepositoryTest.javabackend/libs/scanning/build.gradle.ktsbackend/libs/scanning/src/main/java/dev/cleat/scanning/WorkflowParser.javabackend/libs/scanning/src/main/java/dev/cleat/scanning/WorkflowScanService.javabackend/libs/scanning/src/test/java/dev/cleat/scanning/WorkflowParserTest.java
💤 Files with no reviewable changes (1)
- backend/apps/worker/src/test/java/dev/cleat/worker/AbstractIntegrationTest.java
✅ Files skipped from review due to trivial changes (2)
- backend/libs/persistence/src/main/java/dev/cleat/persistence/repository/WorkflowScanRepository.java
- backend/libs/persistence/src/main/resources/db/migration/V3__update_repo_and_add_tables.sql
🚧 Files skipped from review as they are similar to previous changes (10)
- backend/apps/api/src/main/java/dev/cleat/api/controller/WorkflowScanController.java
- backend/apps/api/src/test/java/dev/cleat/api/WorkflowScanControllerTest.java
- backend/libs/scanning/build.gradle.kts
- backend/libs/persistence/src/test/java/dev/cleat/persistence/WorkflowScanRepositoryTest.java
- backend/apps/worker/src/main/java/dev/cleat/worker/CleatWorkerApplication.java
- backend/apps/worker/src/main/java/dev/cleat/worker/WorkflowScanJob.java
- backend/libs/scanning/src/main/java/dev/cleat/scanning/WorkflowScanService.java
- backend/libs/scanning/src/main/java/dev/cleat/scanning/WorkflowParser.java
- backend/libs/scanning/src/test/java/dev/cleat/scanning/WorkflowParserTest.java
- backend/libs/persistence/src/main/java/dev/cleat/persistence/entity/RepoEntity.java
| cleat: | ||
| workflow-scan: | ||
| interval-ms: 3600000 No newline at end of file |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
fd 'CleatWorkerApplication\.java$' backend/apps/worker/src/main/java -x sh -c '
echo "== $1 ==";
ast-grep outline "$1";
sed -n "1,160p" "$1"
' sh {}
fd 'AbstractIntegrationTest\.java$' backend/apps/worker/src/test/java -x sh -c '
echo "== $1 ==";
ast-grep outline "$1";
sed -n "1,220p" "$1"
' sh {}
rg -n -C2 '`@EnableScheduling`|`@Scheduled`|spring\.task\.scheduling\.enabled|workflow-scan\.interval-ms' backend/apps/workerRepository: Devlaner/cleat
Length of output: 3224
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== test resources =="
git ls-files 'backend/apps/worker/src/test/resources/**' | sort
echo
echo "== scheduling-related config in worker tests =="
rg -n -C2 'spring\.task\.scheduling\.enabled|EnableScheduling|`@Scheduled`|workflow-scan\.interval-ms' \
backend/apps/worker/src/test backend/apps/worker/src/main
echo
echo "== worker test classes =="
fd -e java backend/apps/worker/src/test/java -x sh -c 'echo "--- $1 ---"; sed -n "1,220p" "$1"' sh {}Repository: Devlaner/cleat
Length of output: 2311
Disable the scheduler in worker tests
CleatWorkerApplication enables scheduling globally, and this test config leaves cleat.workflow-scan.interval-ms active, so WorkflowScanJob can run when the test context starts. Add spring.task.scheduling.enabled: false here to keep integration tests deterministic.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@backend/apps/worker/src/test/resources/application.yml` around lines 5 - 7,
Disable scheduling in the worker test configuration so `WorkflowScanJob` does
not run during test startup. Update the test `application.yml` used by
`CleatWorkerApplication` to set `spring.task.scheduling.enabled` to false while
keeping the existing `cleat.workflow-scan.interval-ms` settings as-is. This
change should be made in the test resource config that defines the worker
application properties.
martian56
left a comment
There was a problem hiding this comment.
I went through the whole diff and every CodeRabbit thread, and checked each one against the current code rather than trusting the resolved flags. Short version: the valid CodeRabbit points are still open. Nine threads, two resolved, seven still active, and six of those seven are real and unaddressed. Two of them are not minor, they are bugs that only surface when the scheduled job actually runs, which is exactly why CI stays green.
The foundation is good. The parser structure, the risk scoring, the entity and migration, the endpoint, and the parser unit tests are all reasonable. But the end to end scan path does not work yet, and that is the part the tests do not exercise.
The two that block:
-
The scheduled job will throw on a lazy load (WorkflowScanJob, inline). run() is not transactional, so accountRepository.findAll() hands back detached entities, and account.getRepos() then trips a LazyInitializationException before scanRepo is ever reached. repo.getAccount().getLogin() inside scanRepo has the same problem since the repo arrives detached. The parser tests and the controller test never go through this path, so it passes CI and fails the moment the hourly job fires. CodeRabbit flagged this as critical and it is right. Easiest fixes: make run() @transactional, or load accounts with their repos via @EntityGraph or a JOIN FETCH query, or pass repo ids into a transactional service method and reload inside.
-
The workflow content fetch hits the wrong host (WorkflowScanService, inline). download_url from the contents API is an absolute raw.githubusercontent.com URL, but GitHubClient.get builds the request with .uri(builder -> builder.path(uri).build()) against the https://api.github.com base, so the absolute URL gets treated as a path under api.github.com. So even once the lazy load is fixed, no YAML actually comes back. Either fetch the content from the contents API response directly (it returns a base64 content field you can decode, no second call needed), or give GitHubClient a way to GET an absolute URL with a plain WebClient. Also worth noting these two together mean the feature has not run successfully end to end yet, so please verify against the compose stack with a seeded account before the next round.
The valid smaller ones, also still open:
-
Permission detection has gaps (WorkflowParser, inline). A top-level scalar permissions: write-all sets broadPermissions but leaves missingOidc true, so it gets both the broad penalty and a false missing-OIDC penalty even though write-all includes id-token: write. Job-level write-all / read-all is never counted as broad either. While you are in there, the map heuristic on line 36 (anyMatch value == write and size > 3) is arbitrary and will miss real broad grants, that is worth rethinking, not just patching.
-
Stale scans are never cleared when a repo loses all its workflows. The empty-files early return happens before deleteByRepoId, so old rows linger and keep showing up in the latest-scan query. Move the delete above the empty check.
-
WorkflowParserTest.emptyYamlReturnsZeroScore asserts a score of 20, not zero. Rename it so the name matches what it checks, something like emptyYamlScoresOnlyMissingOidc.
-
The worker test config leaves scheduling on, so WorkflowScanJob can fire during the integration test context. Add spring.task.scheduling.enabled: false to the worker test application.yml to keep tests deterministic.
The one I would push back on:
- CodeRabbit calls new Yaml().load() a major security hole on the grounds that Spring Boot 3.5.3 ships SnakeYAML 1.33. That premise is wrong. Boot 3.5.3 manages SnakeYAML 2.x, and 2.x is safe by default: the default loader rejects global tags, so arbitrary class instantiation from untrusted YAML is already blocked. So this is not a blocker and not for the reason given. I would still do two small things as plain hygiene: construct with a SafeConstructor so the intent is explicit and you are protected if the version ever moves, and wrap load() so a malformed workflow file throws something you handle rather than bubbling up. Neither is required to merge.
So to answer the actual question: no, the valid CodeRabbit comments are not fixed. Items 1 through 4, 6 and 7 all still stand in the code. Get the two runtime bugs sorted and the four cleanups in, and this is close. Happy to look again right after.
…or WorkflowScanJob
- replace arbitrary map heuristic with explicit PermissionStatus model - fix write-all scalar permission to correctly grant id-token: write - extend broad permission and OIDC checks to include job-level scopes
What does this change?
Description
Adds automated auditing of GitHub Actions workflows across all connected repositories.
What was done
workflow_scantable via Flyway V2 migration with FK torepotableWorkflowScanEntityandWorkflowScanRepositoryto persist scan resultsRepoRepositoryfor repository-level DB accessWorkflowParserinlibs/scanningto parse workflow YAML files and flag:write-all/read-all)id-token: write)WorkflowScanServiceto fetch workflow YAMLs from GitHub API and persist resultsWorkflowScanJobinapps/workeras a scheduled job that scans all repos hourlyGET /api/repos/{repoId}/workflow-scansendpoint inapps/apiTests
WorkflowParserTest— unit tests covering unpinned actions, broad permissions, missing OIDC and clean workflow scenariosWorkflowScanRepositoryTest— integration tests with Testcontainers verifying DB read/write and delete operationsWorkflowScanControllerTest— full stack integration tests verifying endpoint response for populated and empty scan resultsRelated modules
libs/persistencelibs/scanningapps/workerapps/apiRelated issue
Closes # 10
Area
Screenshots
If this touches the UI, drop before and after screenshots here.
Checklist
bun run typecheck && bun run buildinapps/web(for frontend changes)Summary by CodeRabbit