Real Life Is Uncertain. Consensus Should Be Too!
Published in Workshop on Hot Topics in Operating Systems (HOTOS 25), 2025
Modern distributed systems rely on consensus protocols to build a fault-tolerant core upon which they can build applications. Consensus protocols are correct under a specific failure model, where up to $f$ machines can fail. We argue that this $f$-threshold failure model oversimplifies the real world and limits potential opportunities to optimize for cost or performance. We argue instead for a probabilistic failure…