Skip to content

feat: out-of-tree handler plugin system (maru.handler_plugins)#58

Open
hyunyul-XCENA wants to merge 5 commits into
mainfrom
feat/oot-plugin-hooks
Open

feat: out-of-tree handler plugin system (maru.handler_plugins)#58
hyunyul-XCENA wants to merge 5 commits into
mainfrom
feat/oot-plugin-hooks

Conversation

@hyunyul-XCENA

@hyunyul-XCENA hyunyul-XCENA commented Jul 3, 2026

Copy link
Copy Markdown
Collaborator

Summary

Adds an out-of-tree (OOT) handler plugin system so vendor-/hardware-specific behaviour can live in separate packages instead of Maru core. Maru stays vendor-neutral: a plugin package registers a maru.handler_plugins entry point and MaruHandler discovers it at construction and calls it at defined seams — core never imports it. Loading is soft-fail (vLLM-style), so a missing or broken plugin never breaks KV-cache operation.

Key Changes

  • maru_handler/plugin.py (new): MaruHandlerPlugin protocol (4 optional hooks) + load_handler_plugins() soft-fail loader — filters by MARU_PLUGINS, dedups duplicate names (first wins), and warns on a None factory return / typo'd allowlist name.
  • maru_handler/handler.py: dispatch at four seams (on_init / on_batch_retrieve / on_close / contribute_stats) via _dispatch_plugins; the empty-plugin case is guarded so the retrieval hot path stays free. on_batch_retrieve fires after the region-mapping loop (a plugin acts on regions mapped in the current batch); plugin stats are namespaced under stats["plugins"][<PluginClassName>].
  • Stable accessor surface (the whole coupling contract): MaruHandler.is_region_mapped() and get_region_dax_path() (backed by new mapper/client get_dax_path). A contract test fails CI if the accessor signatures, hook names, or entry-point group drift.
  • Docs: docs/source/api_reference/plugins.md.
  • Deferred follow-ups (not regressions): defense-in-depth bounds check on server-supplied kv_offset + kv_length before device ops; access-dumper flush/logging tuning.

Test Plan

  • Unit tests added/updated — tests/unit/test_plugin_loader.py (loader soft-fail / allowlist / dedup / None-return / dispatch isolation / API contract): 16 passed, ruff clean
  • Existing tests pass (pytest -v) — note: the full unit suite hits a pre-existing, environment-specific native crash in the mocked connect() coverage tests on CUDA-enabled hosts (reproduces on main without this branch; unrelated to this change)
  • E2E tests — validated on real CXL/DAX hardware with a real plugin (cross-instance producer/consumer): entry-point autoload + real device prefetch / pin / release

Related Issues

Let vendor-/hardware-specific behaviour live outside maru core instead of in
a separate package. MaruHandler discovers plugins registered under the
`maru.handler_plugins` entry-point group at construction and calls them at
four seams: on_init, on_batch_retrieve, on_close, contribute_stats.

Design follows vLLM's soft-fail plugin model (not PyTorch's hard-fail
autoload): a Maru plugin is an optional optimization, so a load/hook failure
is logged and skipped, never raised. MARU_PLUGINS filters by name.

Plugins couple to core only through a small stable accessor surface:
- MaruHandler.is_region_mapped(region_id)
- MaruHandler.get_region_dax_path(region_id)  (via mapper/client get_dax_path)

Everything a plugin needs about a batch arrives via the on_batch_retrieve
hook args. Empty-plugin hot path stays free (guarded dispatch).

Adds maru_handler/plugin.py (MaruHandlerPlugin protocol + soft-fail loader),
tests/unit/test_plugin_loader.py, and a README Plugins section.
Mark MaruHandler.is_region_mapped / get_region_dax_path and the four
MaruHandlerPlugin hook signatures as the public, stable plugin contract that
out-of-tree plugins depend on across independent release cycles.

- docs/source/api_reference/plugins.md: entry-point group, hook table with
  timing, stable accessor surface, soft-fail + version-skew warning, MARU_PLUGINS.
- handler.py: strengthen accessor docstrings/comment to flag them as stable API.
- test_plugin_loader.py: contract test pinning the accessor signatures, hook
  names, and entry-point group name so CI fails if the surface drifts.
…allowlist

Review-driven robustness for configuration mistakes (soft-fail already covers
runtime exceptions):
- a factory returning None is skipped (was retained as a dead "plugin")
- duplicate entry-point names: first wins + warn (running both would double
  every hook — matches vLLM's loader)
- an allowlist (MARU_PLUGINS) name matching no installed plugin now warns
  instead of silently loading nothing

Adds unit tests for all three (16 passed).
Main's get_stats now returns stats.stats_manager; update the plugin
get_stats test's RPC stub so it doesn't AttributeError after the rebase.
@hyunyul-XCENA hyunyul-XCENA changed the title feat: out-of-tree handler plugin system (retire maru-private fork) feat: out-of-tree handler plugin system (maru.handler_plugins) Jul 3, 2026
@hyunyul-XCENA hyunyul-XCENA force-pushed the feat/oot-plugin-hooks branch from ea88784 to c68c29c Compare July 3, 2026 08:31
@github-actions

github-actions Bot commented Jul 3, 2026

Copy link
Copy Markdown

Coverage

Coverage Report
FileStmtsMissCoverMissing
__init__.py60100% 
__main__.py330%5, 7–8
allocation_manager.py113992%30–31, 44–45, 52, 237–240
client.py186995%88, 130, 147–148, 311–313, 319, 363
config.py46197%72
constants.py90100% 
device_scanner.py943167%25, 100, 102–104, 106, 114–123, 125, 127, 129, 134–143, 145, 147
handler.py5578684%153, 168, 179–186, 194–196, 211, 217–219, 227, 238–243, 248, 273, 300–301, 305, 345, 349–350, 355, 377, 384–385, 389–395, 398, 402–404, 420, 422–423, 432, 486, 507, 517–518, 624–626, 633, 755–758, 761, 772–775, 781, 866, 970, 1133–1137, 1143, 1154–1158, 1164, 1243, 1248, 1290
ipc.py275299%365, 441
kv_manager.py1080100% 
logging_setup.py190100% 
plugin.py54394%136–138
protocol.py2310100% 
resource_manager_installer.py1031387%80–86, 167, 169–172, 187
rpc_async_client.py1900100% 
rpc_async_server.py1110100% 
rpc_client.py660100% 
rpc_client_base.py1061090%185, 221–222, 233–234, 306–307, 311–312, 373
rpc_handler_mixin.py1061982%154–156, 159–161, 219–222, 227, 231–235, 256–257, 263
rpc_server.py640100% 
serializer.py810100% 
server.py1652087%44, 54–59, 64–65, 73, 168, 172, 243, 247, 265–266, 334–336, 421
stats_manager.py950100% 
types.py60198%145
uds_helpers.py130100% 
memory
   __init__.py50100% 
   allocator.py550100% 
   mapper.py130397%229, 251, 300
   owned_region_manager.py101199%212
   types.py620100% 
TOTAL323521193% 

Tests Skipped Failures Errors Time
701 4 💤 0 ❌ 0 🔥 7.595s ⏱️

Public maru docs shouldn't name a specific vendor plugin. Replace the named
example with a generic 'writing a plugin' description and genericize the
region-mapping note.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant