Summary
The gateway runs about 50 threads at idle (about 53 on a Nav2 stack). Most of them are two pools that size themselves to the host CPU count, not to the work.
- The rclcpp executor is created as
MultiThreadedExecutor executor; in src/ros2_medkit_gateway/src/main.cpp (around line 77), with no thread count. The default uses std::thread::hardware_concurrency().
- The vendored cpp-httplib server uses
CPPHTTPLIB_THREAD_POOL_COUNT = max(8, hardware_concurrency - 1). HttpServerManager does not override new_task_queue (src/ros2_medkit_gateway/src/http/http_server.cpp), so the pool stays at that default.
On a 16-core host this is about 39 threads across the two pools. On a 64-core host it would be much larger for the same workload.
Under HTTP load the request p95 latency stays low (about 2.3 ms at 32 concurrent clients). So this is thread and scheduling overhead, not a latency problem. Measured with a benchmark harness in selfpatch_demos (load lane, thread census).
Proposed solution
- Add a parameter for the executor thread count and pass it through
rclcpp::ExecutorOptions. Default to a small bounded value instead of all cores.
- Bound the HTTP pool by setting
server->new_task_queue to a fixed-size ThreadPool, also from a parameter.
Additional context
Neither pool has a parameter today. The thread count grows with the host core count, so the footprint is bigger than needed on many-core machines.
Summary
The gateway runs about 50 threads at idle (about 53 on a Nav2 stack). Most of them are two pools that size themselves to the host CPU count, not to the work.
MultiThreadedExecutor executor;insrc/ros2_medkit_gateway/src/main.cpp(around line 77), with no thread count. The default usesstd::thread::hardware_concurrency().CPPHTTPLIB_THREAD_POOL_COUNT = max(8, hardware_concurrency - 1).HttpServerManagerdoes not overridenew_task_queue(src/ros2_medkit_gateway/src/http/http_server.cpp), so the pool stays at that default.On a 16-core host this is about 39 threads across the two pools. On a 64-core host it would be much larger for the same workload.
Under HTTP load the request p95 latency stays low (about 2.3 ms at 32 concurrent clients). So this is thread and scheduling overhead, not a latency problem. Measured with a benchmark harness in selfpatch_demos (load lane, thread census).
Proposed solution
rclcpp::ExecutorOptions. Default to a small bounded value instead of all cores.server->new_task_queueto a fixed-sizeThreadPool, also from a parameter.Additional context
Neither pool has a parameter today. The thread count grows with the host core count, so the footprint is bigger than needed on many-core machines.