NATS And JetStream
Use durable events to propagate control-plane truth. Do not confuse the event bus with the source of truth.
Commit first
Control-plane/database records decide route ownership and operation state.
Publish second
NATS/JetStream tells edges and workers that truth changed.
Replay carefully
Snapshots, sequence continuity, dedupe, and epochs make recovery honest.
Fail closed
A replay gap means rebuild from a verified snapshot before claiming current truth.
Subject
A hierarchical address for events such as route.owner.updated or allocation.ready.
Correct use
Use subjects to separate streams and consumers by domain.
Failure trap
Over-broad subjects create noisy consumers and accidental coupling.
Stream
Durable storage policy for a set of subjects.
Correct use
Keep retention, dedupe, and replay windows aligned to recovery needs.
Failure trap
Short retention can make edge recovery impossible after downtime.
Durable consumer
A named reader with a remembered cursor.
Correct use
Use for edge workers, schedulers, and read-model builders that must recover.
Failure trap
Ephemeral consumers miss events while offline.
Ack
Consumer proof that an event was processed.
Correct use
Ack only after validation and durable local application if needed.
Failure trap
Acking before applying can create invisible data loss.
Replay
Reading past events from a remembered or chosen sequence.
Correct use
Use replay to catch up from snapshots.
Failure trap
Replay without stale epoch checks can resurrect old route owners.
Snapshot
A compact baseline of current truth at a sequence/checksum.
Correct use
Load snapshot first, then apply later events with continuity proof.
Failure trap
Applying events after an unknown gap gives false confidence.
Deduplication
Rejecting duplicate publish or duplicate consume effects.
Correct use
Use event ids, operation ids, and route epochs.
Failure trap
Duplicate promotion events can look like conflicting truth without idempotent handling.
Database authority
The event bus propagates facts; it does not decide facts.
Correct use
Commit DB/control-plane state first, publish event as change notification.
Failure trap
Treating JetStream as the database makes repair and audit weaker.