Skip to content

Add GetCrashedReplicaMessages and SetReplica to IMessageStore#166

Merged
stidsborg merged 6 commits into
mainfrom
add-crashed-replica-message-store-methods
Jun 14, 2026
Merged

Add GetCrashedReplicaMessages and SetReplica to IMessageStore#166
stidsborg merged 6 commits into
mainfrom
add-crashed-replica-message-store-methods

Conversation

@stidsborg

Copy link
Copy Markdown
Owner

Summary

Adds two methods to IMessageStore to support re-distributing messages stranded by crashed replicas:

Task<List<Tuple<StoredId, long>>> GetCrashedReplicaMessages(IEnumerable<ReplicaId> liveReplicas);
Task SetReplica(IEnumerable<long> positions, ReplicaId newReplica, ReplicaId expectedReplica);
  • GetCrashedReplicaMessages returns the (StoredId, position) identifiers of undelivered messages whose owning replica is not in the supplied live-replica set — i.e. messages stranded by a replica that is no longer alive. It returns just the identifiers (not full StoredMessage content), since the caller only needs them to hand off to SetReplica.
  • SetReplica bulk-reassigns the messages at the given positions to newReplica, guarded by expectedReplica (optimistic concurrency) so a concurrent takeover/delivery is not clobbered. position is the global identity PK in every store, so keying on positions alone is unambiguous.

Changes

  • Interface (Core/.../Messaging/IMessageStore.cs)
  • Implementations: InMemoryFunctionStore, PostgreSqlMessageStore, MariaDbMessageStore, SqlServerMessageStore (+ each store's SqlGenerator). Each GetCrashedReplicaMessages selects only id, position via a lightweight ReadStoredIdAndPositions reader.
  • Tests added to the shared MessageStoreTests template (run against all four stores):
    • CrashedReplicaMessagesAreFetched — only messages owned by crashed replicas are returned.
    • MessageReplicaCanBeReassigned — positions owned by the expected replica are reassigned; one owned by a different replica is left untouched (verifies the expectedReplica guard).

Testing

GetCrashedReplicaMessages / SetReplica tests pass on all stores:

  • In-memory: 2/2
  • PostgreSQL: 2/2
  • SqlServer: 2/2
  • MariaDB: 2/2

GetCrashedReplicaMessages returns the (flow, position) identifiers of
undelivered messages owned by a replica that is no longer alive (not in
the supplied live-replica set). SetReplica re-assigns the messages at the
given positions to a new replica, guarded by an expected-replica check so
a concurrent takeover is not clobbered.

Implemented across the in-memory, PostgreSQL, MariaDB and SqlServer
stores, with shared message-store tests covering crashed-replica fetching
and guarded re-assignment.
Introduces a named StoredIdAndPosition record (in Storage/Types.cs) for
the GetCrashedReplicaMessages result instead of the anonymous
Tuple<StoredId, long>, giving the (flow, position) pair meaningful member
names.
…ssages

The message replica is always populated (AppendMessage COALESCEs to the
publisher replica, never null), so the column is now declared NOT NULL
across all three stores. With non-null replicas the GetCrashedReplicaMessages
query drops the redundant IS NOT NULL guard, the unneeded ORDER BY, and the
empty-live-set conditional (the caller is always among the live replicas),
leaving a plain WHERE replica NOT IN (...) / != ALL.
A set is the natural type for the membership test and matches how live
replicas are already represented elsewhere (ReplicaWatchdog builds a
HashSet). The in-memory store now uses the set's Contains directly instead
of copying into a local HashSet.
… filter

Collapses the generated @Replica0, @replica1, ... parameters into a single
comma-separated @replicas parameter split server-side, matching the
STRING_SPLIT convention already used for Id IN (...) across the store.
Replaces the generated @position0, @Position1, ... parameters with a single
comma-separated @positions parameter split server-side (CAST AS BIGINT),
matching the STRING_SPLIT convention used elsewhere in the store. The
message-store wrapper already short-circuits on an empty position list, so
STRING_SPLIT never receives an empty string.
@stidsborg stidsborg merged commit 348d8d1 into main Jun 14, 2026
11 of 12 checks passed
@stidsborg stidsborg deleted the add-crashed-replica-message-store-methods branch June 14, 2026 09:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant