Skip to content

refactor: split codex into a cargo workspace of layered crates#25

Open
AshDevFr wants to merge 12 commits into
mainfrom
split-workspace
Open

refactor: split codex into a cargo workspace of layered crates#25
AshDevFr wants to merge 12 commits into
mainfrom
split-workspace

Conversation

@AshDevFr
Copy link
Copy Markdown
Owner

Summary

Restructures the Rust backend from a single 200k-LOC library crate into a Cargo workspace of layered crates. Runtime behavior, the shipped binary, and the HTTP/OPDS/Komga APIs are unchanged. The win is faster incremental builds, per-subsystem build and test commands, and an enforced dependency direction across subsystems.

Motivation

Editing any single backend file forced the entire library to re-typecheck and the binary to relink, even when no downstream consumer was affected. After the easier build-perf levers were already in place (Spotlight exclusion, sccache, test-binary consolidation), the remaining lever was structural: scoping the compilation cache at the crate level. A dependency audit done alongside this work also surfaced wrong-direction imports that the single-crate setup hid; those are corrected as part of the same change.

Changes

  • Source layout: backend code now lives under crates/codex-* as layered workspace members. Shared models, utilities, config, and events sit at the bottom; parsers, database, services, scanner, tasks, scheduler, and search sit in the middle; the HTTP/API layer sits above them; a thin top-level binary wires everything together. The migration crate continues to live as its own sibling.
  • Build and test workflows: editing a non-API subsystem no longer triggers a recompile of the API layer or its peers. Each subsystem can now be built, tested, or linted in isolation with cargo build -p <crate> and cargo test -p <crate>.
  • Dependency direction enforced: cycle-offending imports between subsystems have been removed and the cross-subsystem graph is now a strict downward DAG. Types that previously leaked across layers have been hoisted into a small shared crate.
  • Feature flags: the rar feature is declared on the parser crate and forwarded at the workspace level; the observability feature continues to be a workspace-wide toggle.
  • Tooling and docs: Makefile targets, the cargo-dist release configuration, OpenAPI generation, and the Docker dev workflow are updated for the workspace layout. Developer documentation and the project-structure guide are updated to describe the workspace and per-crate commands. Integration tests continue to run against the assembled binary.

Notes

  • The shipped binary, configuration surface, on-disk database schema, and external APIs are unchanged; operators and end users do not need to take any action.
  • Cold builds may be marginally slower due to per-crate metadata overhead; warm incremental rebuilds, which dominate the developer inner loop, are the optimization target and improve materially.

… cycles

Eliminate the wrong-direction imports identified in the dep audit so the
src/ subdirectories form a clean DAG. No behavior change; the codebase
still compiles as a single crate.

Convert src/models.rs into a src/models/ directory module and move types
that both upper and lower layers need to speak:

- models/permissions.rs <- api/permissions.rs  (UserRole, Permission)
- models/sort.rs       <- dto::{series,book}   (SortDirection, sort fields)
- models/filter.rs     <- dto/filter.rs        (operator + condition enums)
- models/task.rs       <- tasks/types.rs       (TaskType, TaskResult, ...)
- models/release.rs    <- services/release/*   (NumericSpan, OwnedReleaseKeys)
- models/plugin.rs     <- services/plugin/protocol.rs (PluginManifest, PluginScope,
                                                       capabilities, OAuth, credentials)
- models/preprocessing.rs <- services/metadata/preprocessing/types.rs

Move two service-shaped utilities to their natural homes:

- ContentFilter:        api/extractors -> services/content_filter
- CredentialEncryption: services/plugin/encryption -> utils/credential_encryption

Break services -> scheduler by introducing services::scheduler_handle::
SchedulerReconciler (boxed-future trait, object-safe). The plugin manager
and releases handler hold Arc<dyn SchedulerReconciler>; scheduler::
LockedSchedulerReconciler adapts the concrete Scheduler behind a Mutex.

Old paths (codex::api::permissions::*, services::CredentialEncryption,
tasks::types::*, services::plugin::protocol::PluginManifest, etc.) all
remain accessible via pub-use shims so the integration tests and any
external code that imports them keep compiling.

Validated with cargo fmt, cargo clippy --workspace --all-targets
-D warnings, and make test-fast.
@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented May 23, 2026

Deploying codex with  Cloudflare Pages  Cloudflare Pages

Latest commit: 8042095
Status: ✅  Deploy successful!
Preview URL: https://b982bfd1.codex-asm.pages.dev
Branch Preview URL: https://split-workspace.codex-asm.pages.dev

View logs

AshDevFr added 8 commits May 23, 2026 14:21
…+ codex-events

Convert the single-crate codex project into a Cargo workspace and split off
the first two leaf crates. crates/codex-config now owns the config types,
loader, and env-override plumbing; crates/codex-events owns the entity-change
event types, broadcaster, and task-context plumbing. Neither new crate
depends on any other Codex-internal crate.

The root Cargo.toml gains [workspace.members] and a [workspace.dependencies]
table seeded with the deps these crates share with the root (serde,
serde_yaml, anyhow, chrono, tokio, tracing, utoipa, uuid, plus tempfile and
serial_test for dev). The root [dependencies] inherit them via
{ workspace = true }; everything else stays inline until cross-crate usage
forces it. Profiles remain on the root manifest (Cargo treats them as
workspace-wide already).

src/lib.rs drops `pub mod config; pub mod events;` and re-exports the new
crates with `pub use codex_config as config; pub use codex_events as events;`
so external integration tests under tests/ continue to compile unchanged.
Inside src/, every `crate::config::` / `crate::events::` reference was
rewritten to the explicit `codex_config::` / `codex_events::` paths to make
the new dep edges visible at every callsite.

One latent events→db coupling surfaced and was cleared as part of the
extraction: EntityChangeEvent::release_announced used to take a
&release_ledger::Model, which would have dragged codex-db into codex-events.
Refactored to take primitive fields (ledger_id, series_id, ...); the two
callers destructure the Model at the boundary.

cargo build/clippy/fmt clean across the workspace. make test-fast passes;
both new crates build in isolation (cargo build -p codex-config /
-p codex-events). cargo-dist plan still targets only the codex binary.
… crates

Splits three more leaf subsystems out of the monolithic codex crate into
sibling workspace members alongside codex-config and codex-events.

- codex-models: pure-leaf shared types (permissions, sort, filter, plugin,
  preprocessing, release, strategies, task). No internal deps.
- codex-utils: hashing, password, jwt, cron, deadline, json, natural sort,
  search, serde adapters, credential encryption. Depends on codex-models
  (jwt -> UserRole).
- codex-parsers: CBZ/CBR/EPUB/PDF parsers, ComicInfo/OPF/series.json
  metadata, image utilities. Depends on codex-utils for CodexError and
  hash_file. Owns the `rar` feature; the root crate's `rar` feature now
  forwards to codex-parsers/rar.

Workspace-internal crates are now declared in [workspace.dependencies] so
members reference each other via { workspace = true } from one source of
truth instead of inline path = "...".

The root codex crate keeps `pub use codex_{models,utils,parsers} as
{models,utils,parsers}` so integration tests that import via codex::*
continue to resolve without changes. EpubParser::find_root_file and
parse_opf were promoted from pub(crate) to pub since they're called from
the still-root Komga manifest handler.

Cold and warm rebuild times stay flat-to-slightly-improved; the workspace
mechanics work but the dominant compile cost is still the root crate
which holds api, db, services, scanner, tasks, scheduler, and search.
Moves the SeaORM entities, repositories, connection pool, and test helpers
out of the monolithic codex crate into a new sibling workspace member.

- crates/codex-db: depends on codex-config, codex-events, codex-models,
  codex-utils, plus sea-orm, sea-orm-migration, and the migration crate.
  Root keeps sea-orm (direct call sites in services/api/scanner) and the
  migration crate (oidc handler runs Migrator::up) but drops
  sea-orm-migration, which is now only used transitively through codex-db.
- New test-utils feature on codex-db gates test_helpers behind
  cfg(any(test, feature = "test-utils")) so downstream crates can opt in
  via a dev-dependency feature flag without dragging tempfile or SQLite
  fixture plumbing into release builds. Root crate's dev-deps enable the
  feature.
- The observability::repo module (db_system_str helper that maps a SeaORM
  backend to the OTel db.system attribute) moves into codex-db::trace
  along with its tracing-subscriber tests. The function only consumes
  SeaORM types, so the observability home was historical accident; the
  move breaks the otherwise circular db -> observability edge.

Consumers (api, services, tasks, scanner, scheduler, search, commands,
observability) now reference codex_db::* instead of crate::db::*. The
root codex crate keeps `pub use codex_db as db` so integration tests
that import via codex::db::* continue to resolve.

cargo nextest run defaults to the current package, which was silently
skipping leaf-crate tests added by previous extractions; the Makefile's
test* targets now pass --workspace to cover every crate's suite.

Cold build drops ~5%, warm rebuild after touching an API handler drops
~20% vs the previous workspace shape. Editing src/api/ recompiles only
the root crate; codex-db and the rest of the leaf crates stay cached.
Moves the entire services layer (auth, plugin runtime, metadata pipeline,
release tracking, thumbnail/PDF caches, file cleanup, OIDC, email, etc.)
out of the root crate into a new sibling workspace member.

Two cycles surfaced during extraction and got resolved before the move:

- services -> observability::metrics. The metrics module (OTel meter
  instruments for plugin/task lifecycle events) takes only primitives and
  is conceptually a service concern. Moved into codex-services as
  `services::metrics`, gated behind a new `observability` feature on the
  crate that forwards to the opentelemetry deps. The root crate's
  observability module re-exports `pub use codex_services::metrics` so
  existing `crate::observability::metrics::*` call paths in tasks and
  the API HTTP middleware keep resolving with no churn at those sites.

- services -> tasks::handlers::poll_release_source::lookup_series_title.
  The function was a pure series-title DB lookup that the services-side
  reverse-RPC release handler called into. Moved to
  `services::release::announce::lookup_series_title`; the tasks-side
  caller now uses the codex-services path. Removes the only real upward
  edge from services to tasks.

Also folds in a small layering drift fix: scanner imported
`crate::tasks::types::TaskType`, but `TaskType` has lived in
`codex-models::task` since the earlier layering cleanup, so the import
is re-pointed there. Lets scanner sit cleanly below tasks once it's
extracted.

Root crate keeps `pub use codex_services as services` so integration
tests using `codex::services::*` paths resolve unchanged. Plugin/task
metric tests that need `opentelemetry_sdk` and `tracing_subscriber`
move into codex-services dev-dependencies.
Moves the in-memory fuzzy search index (nucleo-matcher backed) out of
the root crate into a new sibling workspace member.

codex-search has no peer deps among the business-layer crates; it sits
as a pure leaf alongside codex-services. Its only inputs are codex-db
(entity reads for the index builder) and codex-events (the subscriber
that patches the index on series/book changes). codex-utils is pulled
in for the normalize_for_search helper.

Root crate keeps `pub use codex_search as search` so callers using
codex::search::* continue to resolve. Sed-swept crate::search::* sites
in the api handlers to codex_search::*.
Moves the library scanner (directory traversal, format detection,
strategy resolution, book/series/page repository writes) out of the
root crate into a new sibling workspace member.

Depends on codex-db, codex-events, codex-models, codex-parsers,
codex-services, codex-utils. The services dep is real: the scanner
reaches for PdfHandleCache and the metadata preprocessing pipeline
during analysis. The cyclic edge from scanner to tasks::types::TaskType
was already retired before the services extraction by re-pointing the
import at codex_models::task::TaskType, where TaskType has lived since
the earlier layering cleanup.

Owns its own rar feature that forwards to codex-parsers/rar so CBR
files participate in the scan; the root rar feature now forwards to
both codex-parsers and codex-scanner.

Root crate keeps `pub use codex_scanner as scanner` so callers via
codex::scanner::* continue to resolve.
Moves the task queue worker and all task handlers (scan_library,
analyze_book, generate_thumbnails, refresh_library_metadata,
poll_release_source, user_plugin_sync, etc.) out of the root crate
into a new sibling workspace member.

Depends on codex-services (PluginManager, ThumbnailService,
SettingsService, OAuthStateManager, release tracking helpers) and
codex-scanner (scan_library, analyze_book entry points). Pulls in
codex-parsers for the thumbnail generator that opens PDFs directly.

Owns its own rar feature that forwards to codex-scanner/rar so
library-scan tasks include CBR files. Root rar feature now forwards
to all four crates that participate in the CBR path: parsers,
scanner, services, and tasks.

worker.rs's metric calls (task_in_flight_inc/dec) move from the old
crate::observability::metrics path to codex_services::metrics, since
the metrics module relocated to codex-services in the previous commit.

Root crate keeps pub use codex_tasks as tasks so callers via
codex::tasks::* continue to resolve.
Moves the cron-driven scheduler (tokio-cron-scheduler integration,
library scan job dispatch, release-source poll scheduling) out of the
root crate into the final business-layer sibling workspace member.

Top of the business-layer stack: depends on codex-services (settings
service, release schedule resolver, library job parser), codex-scanner
(ScanMode, ScanningConfig for the scan job factory), codex-tasks
(TaskType enqueuing), plus codex-db and codex-models for repository
access and shared types. The SharedSchedulerReconciler trait the plugin
manager uses lives in codex-services::scheduler_handle; the scheduler
crate provides the concrete impl that the binary wires up at serve
time, so the cycle stays broken by trait.

Root crate keeps `pub use codex_scheduler as scheduler` so callers via
codex::scheduler::* continue to resolve.

With this extraction, the root crate now contains only api,
observability, web, commands, main, and the re-export facade.
AshDevFr added 3 commits May 23, 2026 21:05
Move src/api/, src/web.rs, and src/observability/ into a new
crates/codex-api/ workspace member. The root codex crate now contains
only main.rs, commands/ (CLI orchestration), and lib.rs re-exports that
keep the historic codex::api / codex::observability / codex::web paths
working for integration tests.

Root [dependencies] drops from ~50 inline crates to ~14: workspace
members + the few helpers commands/ actually uses (clap, sea-orm,
rand, tabled, tracing-subscriber, tracing-appender, axum::serve,
walkdir). The `rar`, `observability`, and `embed-frontend` features
now cascade through codex-api.

Two version-propagation issues surfaced and are fixed here:
- `info::get_app_info` now reads app_name/app_version from AppState
  (env!("CARGO_PKG_VERSION") inside codex-api resolves to 0.0.0). The
  binary populates these from its own env vars; tests do the same.
- The OpenAPI spec version is wired via a crates/codex-api/build.rs
  that reads the root Cargo.toml and emits CODEX_BIN_VERSION as a
  build-time env var picked up by the utoipa::OpenApi derive.

Touching an api handler now recompiles only codex-api and the root
binary; touching src/commands/*.rs recompiles only the binary. Cold
build and warm-rebuild times both drop materially against the
pre-split baseline.

Workspace builds clean, clippy --workspace --all-targets is warning-
free, cargo dist plan unchanged, make openapi produces the correct
1.29.0 spec, and the full test suite passes.
Two follow-ups to the recent workspace extraction:

- Add [workspace.package] to the root Cargo.toml as the single source
  of truth for version/edition and switch all 13 members to
  workspace inheritance. Cleans up the 0.0.0 vs 1.29.0 mismatch in
  cargo build output without touching release-prepare (its sed still
  matches the one ^version = "..." line, now under [workspace.package]).
- Update Dockerfile and Dockerfile.dev so every workspace member's
  Cargo.toml is copied into the build context and stub src/lib.rs
  files are created per crate, restoring the dependency-cache layer
  for both the chef planner and the dev image.
The codex-api extraction moved web.rs from the root crate into
crates/codex-api/, so `#[folder = "web/dist"]` started resolving
against this crate's CARGO_MANIFEST_DIR (crates/codex-api/web/dist)
instead of the workspace root. Docker builds with embed-frontend
failed at the RustEmbed derive, cascading into a wall of
"StaticAssets::get not found" errors in CI. Default-feature local
builds passed because embed-frontend was off.

Emit the absolute dist path from build.rs as CODEX_WEB_DIST,
alongside the existing CODEX_BIN_VERSION (both fragile
walk-up-to-workspace-root assumptions now live in one file), consume
it via `$CODEX_WEB_DIST` in web.rs, and enable rust-embed's
interpolate-folder-path feature so the variable expands.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant