Move InvokerMemoryTest services to a Dockerized Rust container#428
Conversation
|
I chose to implement the test service in Rust to go in the direction of rustify and thereby making the e2e test infrastructure more accessible for the runtime team. |
eee68f8 to
d01e502
Compare
d01e502 to
f2ea84e
Compare
muhamadazmy
left a comment
There was a problem hiding this comment.
The changes looks good to me. But I think @slinkydeveloper can judge the Kotlin changes way better than me
f2ea84e to
61670a9
Compare
|
let's stall on this for the time being. I prefer we continue to cover and investigate issues of an SDK our users actually use instead than rust. Will revisit post 1.7 |
|
I would like to get this in because it causes test instability on CI and every time there is such a failure someone needs to go and take a look at it. I am happy to change it back to something else later but stalling it is not the right call at this point imo. Unrelated side note: Whether the Java SDK is more frequently used than the Rust SDK is not so clear to me. The big benefit is that now the runtime team can work with these tests if they are not using the Java SDK. |
Background: InvokerMemoryTest previously bound MemoryPressureService and
StatefulObject in-process in the JVM via Endpoint.bind(...), with the
Restate container reaching the SDK through the testcontainers
host-port relay (host.testcontainers.internal). Under sustained
sequential load, that relay throttled service-runtime traffic to a
fraction of normal throughput. We were not able to pinpoint the exact
mechanism (single-threaded socat hop, lack of HTTP/2 flow-control
awareness, and connection-tracking on the bridge network are all
candidates), but the slowdown reproduces only on the relay path —
sibling-container deployments on the same bridge network are
unaffected. Switching the test off the relay was the most actionable
mitigation.
Implementation:
- New aggregate Rust binary at e2e-tests/services/rust/ packaged as
ghcr.io/restatedev/e2e-test-services-rs. The binary reads the
SERVICES env var (already injected by ServiceSpec.withServices(...))
and conditionally binds the requested services, matching the
sdk-tests convention. Adding a new Rust-side e2e service is now a
module + match-arm change — no new image.
- MemoryPressureService and StatefulObject implemented under
src/invoker_memory.rs with #[restate_sdk::service] /
#[restate_sdk::object] and explicit #[name = "..."] mappings so the
camelCase Restate names line up with the Kotlin contracts.
- Kotlin-side contracts promoted to top-level interfaces under
e2e-tests/.../contracts/, matching the sdk-tests layout.
- infra: new RestateDeployer.Builder.withServiceDeploymentConfig(...)
so a test can pin its own image without going through the global
--service-container-image CLI flag.
- InvokerMemoryTest swaps withEndpoint(Endpoint.bind(...)) for
withServiceDeploymentConfig + withServiceSpec. The runtime now
reaches the service over the standard bridge network at
http://invoker-memory:9080/ — no SSH relay involved.
Image must be published manually once before this branch passes CI:
docker buildx build --platform linux/amd64,linux/arm64 \\
-t ghcr.io/restatedev/e2e-test-services-rs:0.1.0 \\
--push e2e-tests/services/rust
See e2e-tests/services/rust/README.md for the full publish flow.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
61670a9 to
7fd4456
Compare
Background: InvokerMemoryTest previously bound MemoryPressureService and StatefulObject in-process in the JVM via Endpoint.bind(...), with the Restate container reaching the SDK through the testcontainers host-port relay (host.testcontainers.internal). Under sustained sequential load, that relay throttled service-runtime traffic to a fraction of normal throughput. We were not able to pinpoint the exact mechanism (single-threaded socat hop, lack of HTTP/2 flow-control awareness, and connection-tracking on the bridge network are all candidates), but the slowdown reproduces only on the relay path — sibling-container deployments on the same bridge network are unaffected. Switching the test off the relay was the most actionable mitigation.
Implementation: