feat(rl.php): batch decide action for Thompson Sampling lookups by jjroelofs · Pull Request #44 · dxpr/rl

jjroelofs · 2026-04-15T12:05:14Z

Closes #43

Summary

Adds a third action, decide, to rl.php for batch Thompson Sampling lookups over a list of experiment IDs.
Module-agnostic: caller passes pre-registered IDs + parallel arm counts, receives {"decisions":{eid:{armId:"vN"}}} back.
Unknown / unregistered / <2-arm experiments are omitted from the response so callers fall back to arm 0.
Reuses the existing rl.experiment_registry + rl.experiment_manager service path, joining after the same minimal kernel boot as turn/reward beacons.

Why

Integration modules that resolve experiments client-side (DXPR Builder's rl_dxpr_variant runtime, etc.) need a cheap decision endpoint that skips Drupal's routing middleware. The existing rl.php bare-file pattern already does this for turn/reward beacons; this extends it to decisions so the runtime can fetch winners over a single POST without the per-request controller overhead.

Test plan

Node.js / syntax: php -l rl.php clean
Unit check: curl -X POST rl.php -d 'action=decide&experiment_ids=<id>&arm_counts=2' returns 200 JSON with decisions object
End-to-end via DXPR Builder rl_dxpr_variant runtime: anonymous page load fetches decisions, applies winners, records turn/reward beacons via the same file. Verified on a 2-arm az_button variant (cold-start Thompson distribution 17/13 across 30 requests; adapts to 30/0 after 40 seed rewards).
Covered by existing rl test suite (turn/reward tests are unchanged)

Notes for review

Existing turn/reward validation path is untouched; the decide branch forks from filter_input before falling into the kernel boot, then forks again inside try { ... } after the container is available.
filter_var(FILTER_SANITIZE_FULL_SPECIAL_CHARS) + regex ^[a-zA-Z0-9_-]+$ gate each id before it reaches the registry.
Response is explicitly Cache-Control: no-store, private, max-age=0 so varnish / fastly don't serve stale decisions.

Adds a `decide` action to rl.php that resolves a batch of experiment IDs to their winning arms via the existing ThompsonScores path. Module-agnostic: callers pass `experiment_ids` and a parallel `arm_counts` list; response is `{"decisions":{eid:{armId:"vN"}, ...}}`. Lets modules like dxpr_builder's rl_dxpr_variant runtime avoid the per-request overhead of a Drupal-routed decision endpoint. The existing turn/reward hot path is unchanged and is joined by this decide branch after the same minimal kernel boot. Unknown or zero-arm experiments are omitted from the response so callers can fall back to arm 0.

Absorbs PR #44 into the Drupal.rl batch transport so client-decide consumers (DXPR Builder and similar full-page-cache builders) share a single round trip with the turn/reward tracking already batched by Drupal.rl. - js/rl.js gains Drupal.rl.decide(experimentId, armIds) returning a Promise<armId>. Decides share the same 500 ms queue and the same POST as turns and rewards, so a page with a variant block plus other RL tracking ends up making one request instead of two. Fallback on server failure or missing decision resolves to armIds[0] so callers never need a .catch() for the common path. - rl.php handle_batch_request processes an optional decides section, calls ExperimentManager::getThompsonScores to seed cold-start priors, and returns a decisions map keyed by experiment id. The batch response becomes {"ok":true,"decisions":{...}}. - Documentation adds a "decide discipline" section: callers must read arm ids from a DOM attribute that the server-side renderer emitted, never hardcode. This mirrors ai_sorting's PHP pattern of recomputing arm ids from the current view query on every render and keeps JS drift-free without forcing the rl core to store arm lists. The convention (v0..vN, UUIDs, node ids) is whatever the builder emits; rl core is arm-agnostic.

* feat: Drupal.rl thin JS API with request batching (#42) Introduces a shared Drupal.rl transport layer so multiple RL consumers on the same page produce ~2 requests instead of one per experiment: - js/rl.js exposes Drupal.rl.decide / turn / reward / flush. Decides flush on the next tick (catching every module that registers in Drupal.behaviors.attach); turns and rewards flush in a 500 ms window and via sendBeacon on visibilitychange / pagehide. - rl.php collapses the legacy turn/turns/reward/decide form handlers into a single action=batch JSON endpoint. ping is preserved for the hook_requirements() health check. - rl_page_attachments() publishes drupalSettings.rl.endpointUrl so consumer modules no longer have to compute and attach the URL themselves. - All four consumer modules (rl_example, rl_example_frontend, rl_menu_link, rl_page_title) migrated to the Drupal.rl API. The broken action=scores call in rl_example_frontend is replaced with Drupal.rl.decide(). - README and docs updated to describe the new JS API. * docs: document rl.php HTTP endpoint for non-browser callers Drupal.rl is only one consumer of the batch endpoint. Native mobile apps, server-side workers, other CMSes, and edge functions can POST to rl.php directly using the same JSON protocol. Document the wire format, validation rules, error responses, and curl examples in both README.md and the project description HTML so those callers do not have to read rl.php to integrate. * refactor: make rl.php additive and drop JS decide API Responds to the arm-ownership / ai_sorting feedback on #45. Deciding which variant to show belongs in PHP at render time where the consumer already owns the arm list (see ai_sorting's Views sort plugin and VariantSelectorBase in this module). Client-side decides would force runtime JS to know the current arm set, which drifts out of sync when experiment managers add or remove variants. - rl.php is now strictly additive: action=ping, action=turn, action=turns, and action=reward legacy form handlers are restored unchanged so ai_sorting and any other production consumer keeps working. The new action=batch sits alongside them as a JSON endpoint that Drupal.rl speaks. - action=batch carries only turns and rewards. The decides section is gone, and the response is now {"ok":true} instead of a decisions map. - js/rl.js drops Drupal.rl.decide() entirely and all of its promise / queue machinery. The API is now just turn(), reward(), and flush() plus pagehide sendBeacon. - rl_example_frontend is converted to the canonical pattern: the block decides the winning variant in PHP inside build() using ExperimentManager::getThompsonScores(), server-renders the winning button text, and exposes the arm id to Drupal.rl.turn() / .reward() via drupalSettings. Block picks up a 60-second cache max-age so the server-chosen variant can rotate as scores evolve. - README and HTML docs are rewritten around this split: "deciding" section for the PHP pattern, "JS API" section for tracking, "HTTP API" section documenting all five actions (ping / turn / turns / reward / batch) as peers. * fix(lint): satisfy drupal-lint on rl.php - Add missing @param descriptions for $registry and $storage in handle_batch_request(). - Replace direct $_GET access with filter_input(INPUT_GET, ...) so the Drupal coding standards sniff for super globals stays happy. request_stack is not usable here because action dispatch has to happen before the Drupal kernel boots. * feat(rl): add Drupal.rl.decide with DOM-read arm list Absorbs PR #44 into the Drupal.rl batch transport so client-decide consumers (DXPR Builder and similar full-page-cache builders) share a single round trip with the turn/reward tracking already batched by Drupal.rl. - js/rl.js gains Drupal.rl.decide(experimentId, armIds) returning a Promise<armId>. Decides share the same 500 ms queue and the same POST as turns and rewards, so a page with a variant block plus other RL tracking ends up making one request instead of two. Fallback on server failure or missing decision resolves to armIds[0] so callers never need a .catch() for the common path. - rl.php handle_batch_request processes an optional decides section, calls ExperimentManager::getThompsonScores to seed cold-start priors, and returns a decisions map keyed by experiment id. The batch response becomes {"ok":true,"decisions":{...}}. - Documentation adds a "decide discipline" section: callers must read arm ids from a DOM attribute that the server-side renderer emitted, never hardcode. This mirrors ai_sorting's PHP pattern of recomputing arm ids from the current view query on every render and keeps JS drift-free without forcing the rl core to store arm lists. The convention (v0..vN, UUIDs, node ids) is whatever the builder emits; rl core is arm-agnostic. * docs: trim project description decide section to onboarding tone --------- Co-authored-by: Jurriaan Roelofs <jur@dxpr.com>

jjroelofs · 2026-04-15T13:26:44Z

Absorbed into #45 (merged as a634024). The decide endpoint now lives inside action=batch and is exposed through Drupal.rl.decide(experimentId, armIds), which shares the same 500 ms batch window as turn() and reward() so a page with a DXPR Builder variant block plus other RL tracking rides one POST instead of two.

jjroelofs closed this Apr 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(rl.php): batch decide action for Thompson Sampling lookups#44

feat(rl.php): batch decide action for Thompson Sampling lookups#44
jjroelofs wants to merge 1 commit into
1.xfrom
jur/1.x/decide-action

jjroelofs commented Apr 15, 2026

Uh oh!

jjroelofs commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jjroelofs commented Apr 15, 2026

Summary

Why

Test plan

Notes for review

Uh oh!

jjroelofs commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant