Skip to content

feat(rl.php): batch decide action for Thompson Sampling lookups#44

Closed
jjroelofs wants to merge 1 commit into
1.xfrom
jur/1.x/decide-action
Closed

feat(rl.php): batch decide action for Thompson Sampling lookups#44
jjroelofs wants to merge 1 commit into
1.xfrom
jur/1.x/decide-action

Conversation

@jjroelofs
Copy link
Copy Markdown
Contributor

Closes #43

Summary

  • Adds a third action, decide, to rl.php for batch Thompson Sampling lookups over a list of experiment IDs.
  • Module-agnostic: caller passes pre-registered IDs + parallel arm counts, receives {"decisions":{eid:{armId:"vN"}}} back.
  • Unknown / unregistered / <2-arm experiments are omitted from the response so callers fall back to arm 0.
  • Reuses the existing rl.experiment_registry + rl.experiment_manager service path, joining after the same minimal kernel boot as turn/reward beacons.

Why

Integration modules that resolve experiments client-side (DXPR Builder's rl_dxpr_variant runtime, etc.) need a cheap decision endpoint that skips Drupal's routing middleware. The existing rl.php bare-file pattern already does this for turn/reward beacons; this extends it to decisions so the runtime can fetch winners over a single POST without the per-request controller overhead.

Test plan

  • Node.js / syntax: php -l rl.php clean
  • Unit check: curl -X POST rl.php -d 'action=decide&experiment_ids=<id>&arm_counts=2' returns 200 JSON with decisions object
  • End-to-end via DXPR Builder rl_dxpr_variant runtime: anonymous page load fetches decisions, applies winners, records turn/reward beacons via the same file. Verified on a 2-arm az_button variant (cold-start Thompson distribution 17/13 across 30 requests; adapts to 30/0 after 40 seed rewards).
  • Covered by existing rl test suite (turn/reward tests are unchanged)

Notes for review

  • Existing turn/reward validation path is untouched; the decide branch forks from filter_input before falling into the kernel boot, then forks again inside try { ... } after the container is available.
  • filter_var(FILTER_SANITIZE_FULL_SPECIAL_CHARS) + regex ^[a-zA-Z0-9_-]+$ gate each id before it reaches the registry.
  • Response is explicitly Cache-Control: no-store, private, max-age=0 so varnish / fastly don't serve stale decisions.

Adds a `decide` action to rl.php that resolves a batch of experiment
IDs to their winning arms via the existing ThompsonScores path.
Module-agnostic: callers pass `experiment_ids` and a parallel
`arm_counts` list; response is `{"decisions":{eid:{armId:"vN"}, ...}}`.

Lets modules like dxpr_builder's rl_dxpr_variant runtime avoid the
per-request overhead of a Drupal-routed decision endpoint. The
existing turn/reward hot path is unchanged and is joined by this
decide branch after the same minimal kernel boot. Unknown or zero-arm
experiments are omitted from the response so callers can fall back
to arm 0.
jjroelofs pushed a commit that referenced this pull request Apr 15, 2026
Absorbs PR #44 into the Drupal.rl batch transport so client-decide
consumers (DXPR Builder and similar full-page-cache builders) share
a single round trip with the turn/reward tracking already batched by
Drupal.rl.

- js/rl.js gains Drupal.rl.decide(experimentId, armIds) returning a
  Promise<armId>. Decides share the same 500 ms queue and the same
  POST as turns and rewards, so a page with a variant block plus
  other RL tracking ends up making one request instead of two.
  Fallback on server failure or missing decision resolves to
  armIds[0] so callers never need a .catch() for the common path.
- rl.php handle_batch_request processes an optional decides section,
  calls ExperimentManager::getThompsonScores to seed cold-start
  priors, and returns a decisions map keyed by experiment id. The
  batch response becomes {"ok":true,"decisions":{...}}.
- Documentation adds a "decide discipline" section: callers must
  read arm ids from a DOM attribute that the server-side renderer
  emitted, never hardcode. This mirrors ai_sorting's PHP pattern of
  recomputing arm ids from the current view query on every render
  and keeps JS drift-free without forcing the rl core to store arm
  lists. The convention (v0..vN, UUIDs, node ids) is whatever the
  builder emits; rl core is arm-agnostic.
jjroelofs added a commit that referenced this pull request Apr 15, 2026
* feat: Drupal.rl thin JS API with request batching (#42)

Introduces a shared Drupal.rl transport layer so multiple RL consumers on
the same page produce ~2 requests instead of one per experiment:

- js/rl.js exposes Drupal.rl.decide / turn / reward / flush. Decides
  flush on the next tick (catching every module that registers in
  Drupal.behaviors.attach); turns and rewards flush in a 500 ms window
  and via sendBeacon on visibilitychange / pagehide.
- rl.php collapses the legacy turn/turns/reward/decide form handlers
  into a single action=batch JSON endpoint. ping is preserved for the
  hook_requirements() health check.
- rl_page_attachments() publishes drupalSettings.rl.endpointUrl so
  consumer modules no longer have to compute and attach the URL
  themselves.
- All four consumer modules (rl_example, rl_example_frontend,
  rl_menu_link, rl_page_title) migrated to the Drupal.rl API. The
  broken action=scores call in rl_example_frontend is replaced with
  Drupal.rl.decide().
- README and docs updated to describe the new JS API.

* docs: document rl.php HTTP endpoint for non-browser callers

Drupal.rl is only one consumer of the batch endpoint. Native mobile
apps, server-side workers, other CMSes, and edge functions can POST to
rl.php directly using the same JSON protocol. Document the wire format,
validation rules, error responses, and curl examples in both README.md
and the project description HTML so those callers do not have to read
rl.php to integrate.

* refactor: make rl.php additive and drop JS decide API

Responds to the arm-ownership / ai_sorting feedback on #45. Deciding
which variant to show belongs in PHP at render time where the consumer
already owns the arm list (see ai_sorting's Views sort plugin and
VariantSelectorBase in this module). Client-side decides would force
runtime JS to know the current arm set, which drifts out of sync when
experiment managers add or remove variants.

- rl.php is now strictly additive: action=ping, action=turn,
  action=turns, and action=reward legacy form handlers are restored
  unchanged so ai_sorting and any other production consumer keeps
  working. The new action=batch sits alongside them as a JSON endpoint
  that Drupal.rl speaks.
- action=batch carries only turns and rewards. The decides section is
  gone, and the response is now {"ok":true} instead of a decisions map.
- js/rl.js drops Drupal.rl.decide() entirely and all of its promise /
  queue machinery. The API is now just turn(), reward(), and flush()
  plus pagehide sendBeacon.
- rl_example_frontend is converted to the canonical pattern: the block
  decides the winning variant in PHP inside build() using
  ExperimentManager::getThompsonScores(), server-renders the winning
  button text, and exposes the arm id to Drupal.rl.turn() / .reward()
  via drupalSettings. Block picks up a 60-second cache max-age so the
  server-chosen variant can rotate as scores evolve.
- README and HTML docs are rewritten around this split: "deciding"
  section for the PHP pattern, "JS API" section for tracking, "HTTP
  API" section documenting all five actions (ping / turn / turns /
  reward / batch) as peers.

* fix(lint): satisfy drupal-lint on rl.php

- Add missing @param descriptions for $registry and $storage in
  handle_batch_request().
- Replace direct $_GET access with filter_input(INPUT_GET, ...) so the
  Drupal coding standards sniff for super globals stays happy.
  request_stack is not usable here because action dispatch has to
  happen before the Drupal kernel boots.

* feat(rl): add Drupal.rl.decide with DOM-read arm list

Absorbs PR #44 into the Drupal.rl batch transport so client-decide
consumers (DXPR Builder and similar full-page-cache builders) share
a single round trip with the turn/reward tracking already batched by
Drupal.rl.

- js/rl.js gains Drupal.rl.decide(experimentId, armIds) returning a
  Promise<armId>. Decides share the same 500 ms queue and the same
  POST as turns and rewards, so a page with a variant block plus
  other RL tracking ends up making one request instead of two.
  Fallback on server failure or missing decision resolves to
  armIds[0] so callers never need a .catch() for the common path.
- rl.php handle_batch_request processes an optional decides section,
  calls ExperimentManager::getThompsonScores to seed cold-start
  priors, and returns a decisions map keyed by experiment id. The
  batch response becomes {"ok":true,"decisions":{...}}.
- Documentation adds a "decide discipline" section: callers must
  read arm ids from a DOM attribute that the server-side renderer
  emitted, never hardcode. This mirrors ai_sorting's PHP pattern of
  recomputing arm ids from the current view query on every render
  and keeps JS drift-free without forcing the rl core to store arm
  lists. The convention (v0..vN, UUIDs, node ids) is whatever the
  builder emits; rl core is arm-agnostic.

* docs: trim project description decide section to onboarding tone

---------

Co-authored-by: Jurriaan Roelofs <jur@dxpr.com>
@jjroelofs
Copy link
Copy Markdown
Contributor Author

Absorbed into #45 (merged as a634024). The decide endpoint now lives inside action=batch and is exposed through Drupal.rl.decide(experimentId, armIds), which shares the same 500 ms batch window as turn() and reward() so a page with a DXPR Builder variant block plus other RL tracking rides one POST instead of two.

@jjroelofs jjroelofs closed this Apr 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add batch decide action to rl.php for high-throughput Thompson Sampling lookups

1 participant