Old session artifacts may persist after Session kill

# Summary
I have experienced a situation where a phantom graph/session is persisting in a GraphServer after cleanup and causing a collision when launching a new pipeline, notably through a collision of component keys [here](https://github.com/ezmsg-org/ezmsg/blob/adb84e8ab01bb03d242eb37d9f3f3e63d1422c60/src/ezmsg/core/graphcontext.py#L386).

This requires more troubleshooting, but nonetheless I have looked into the session clean-up code and found that there is a mechanism by which cleanup can fail to finish. This was done in collaboration with an AI agent. 

## Areas of (possible) concern

1. The biggest one: *backend.py* (line 866) sets `_cleanup_done = True` before cleanup actually succeeds. If `_cleanup()` is interrupted or raises while waiting for `GraphContext.__aexit__()`, a later cleanup attempt will return early and skip the graph close path. That could leave the session alive if the Python process/interpreter remains alive.
2. Another: *graphcontext.py* (line 746) runs cleanup as:

```python
await self.revert()
await self._close_session()
await self._shutdown_servers()
```
but not in a try/finally. If `revert()` hangs or raises before `_close_session()`, the session socket may not close, so `GraphServer` has no reason to drop the session metadata. Note: `revert()` closes publishers/subscribers and waits for them without a timeout before sending `SESSION_CLEAR` (*graphcontext.py*, line 765). If one client close stalls, it may never reach `SESSION_CLEAR` or `_close_session()`.

## Areas of no discernible concern

The server-side cleanup itself looks conceptually right: `_handle_session()` calls `_drop_session()` in `finally`, and `_drop_session()` removes edges, metadata, and settings (*graphserver.py*, line 632). But that only happens when the session task exits, which depends on the TCP connection closing. There is no session heartbeat/TTL to reap a stale-but-open connection.

## Potential fixes
Cleanup is too dependent on best-effort graceful teardown. The most promising fixes would be:
- Move `_cleanup_done = True` to the end of successful cleanup, or track “cleanup in progress” separately.
- Make `GraphContext.__aexit__()` close the session in a `finally`.
- Consider timeouts around client close/revert.
- (Optionally) add a session heartbeat/lease expiry in GraphServer for long-lived external servers.

Planning to write a first draft of such a fix soon. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Old session artifacts may persist after Session kill #252

Summary

Areas of (possible) concern

Areas of no discernible concern

Potential fixes

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Old session artifacts may persist after Session kill #252

Description

Summary

Areas of (possible) concern

Areas of no discernible concern

Potential fixes

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions