TL;DR — Most production bugs aren't logic errors. They're configurations nobody designed. Here's why state space grows exponentially, why you can't test your way out of it, and the small ladder of tools — from types to model checkers — that actually shrink it.
Most bugs are configurations nobody designed
Open your feature flag dashboard. Count the flags older than six months that you "meant to clean up." Now multiply.
A feature flag is a toggle. It can be on or off. With one flag your system has two possible configurations: on, off. With two flags you have four: on/on, on/off, off/on, off/off. The pattern is exponential — each new flag doubles the number of combinations the rest of the system has to handle.
Stop and look at that for a second:
| Flags | Possible configurations |
|---|---|
| 1 | 2 |
| 5 | 32 |
| 10 | 1,024 |
| 20 | 1,048,576 |
| 30 | 1,073,741,824 |
In 2012, Knight Capital lost $460 million in 45 minutes because of this exact pattern. One feature flag. Eight servers. Seven got the new deployment; one didn't. When the flag was enabled, server number eight ran eight-year-old dead code that was never cleaned up. Forty-five minutes was enough to bankrupt the company. The total state space was sixteen combinations. They had tested one.
Strictly, the proximate cause was a deploy mismatch — server 8 didn't get the new build. But the deploy bug bit because a reused flag re-enabled dead code nobody had pruned. State-space discipline (don't reuse flag names, don't ship dead branches) would have made the release survivable. Sixteen combinations, one tested.
Twenty flags isn't a lot — most teams I've worked on have more. And nobody tests a million configurations. We test the combinations we explicitly enable for cohorts — a few dozen out of the million. The rest exist in production, never seen by QA, waiting for the right user to load the right page in the right order.
The bug, when it comes, doesn't look like a bug. It looks like "customer X is seeing the wrong price." It looks like "this user's dashboard is rendering nothing." It looks like a database race condition until somebody pulls the flag history and notices that this particular three-flag combination has never occurred in the system's lifetime, will never occur again after Friday's release, and is breaking exactly one customer's invoice right now.
That's the canonical version of the bug this article is about. Nobody designed the bad state. Each flag did what it said on its toggle. The combination was a configuration no human had imagined.
And it's not a feature-flag-specific problem. It's the shape every configurable system has: features that compose by multiplication instead of by addition. Permission systems. Form state. Order workflows. Notification preferences. Background-job retry policies. Anything you make a "yes/no" or "pick one of N" decision about, your users will eventually take every combination you didn't plan for.
The state space is the territory
Here's the thing nobody told me in school. Most bugs aren't logic errors. They're states you forgot existed.
You write a function. It handles the three cases you thought of. It works in development. It passes your tests. Then it hits production, where it encounters state #4 — a combination of inputs you never considered — and it does something "creative."
The shape repeats everywhere, not just in feature flags:
- A function with 3 boolean parameters has 8 possible states. You probably tested three of them.
- A form with 15 optional fields has 32,768 combinations of presence and absence. You tested the happy path and maybe one error case.
- A user model with
isAdmin,isVerified,isBanned,isDeletedhas 16 states. About four of them make sense; the other twelve are "what does a banned-and-deleted admin even mean?"
This is what "state space" means: the set of every configuration your code can technically be in, given how the data is shaped. It is almost always much, much larger than the configurations your code actually handles.
The gap between the two is where bugs live.
And this isn't a testing problem. You can't test your way out of it. The math is against you. Always has been.
Knight Capital was the cheap end. The full postmortem is worth reading. The expensive end is harder to walk away from.
Toyota's throttle control firmware had 11,000 global variables. Sixty-seven functions scored above 50 on cyclomatic complexity. The state space was so large it was literally impossible to test, verify, or even reason about. An expert witness demonstrated that a single bit flip in memory could suspend the safety-critical task that prevented unintended acceleration. People died.
Microservices do this at the architecture level. Ten services, each with two versions (current and previous). That's 1,024 deployment configurations. A hundred services with ten versions each? 10^100 combinations — more than atoms in the observable universe. Nobody tests that. Nobody can.
These aren't exotic edge cases. This is the default state of software. The state space grows exponentially. Your ability to verify it grows linearly at best. The gap widens every time someone adds a feature.
Same shape, four domains
Feature flags are the easiest example because the math is bare — N booleans, 2^N states, nobody argues about what the configurations are. But the same shape shows up in every part of every app you've shipped. Four worked vignettes from places where engineers usually underestimate the state space:
React data fetching. Walk into any React codebase and you'll find
isLoading, isError, data, and maybe isStale flying around as
separate booleans. Four booleans, sixteen combinations. Roughly four
of them are valid: idle, loading, error, success. The other twelve
are impossible-but-representable — isLoading && data && !isError
(loading from cache?), !isLoading && !data && !isError (idle, but
how?). React Query v3 redesigned its API around exactly this:
status: 'idle' | 'loading' | 'error' | 'success' as a single
discriminated union, with data only available on success. The
community spent years rediscovering that this was always a tagged
union.
Subscription billing. A SaaS subscription typically tracks
status (trial / active / past_due / canceled), payment_method_id
(present or null), cancel_at_period_end, grace_period_ends_at,
trial_ends_at. Roughly 64 representable combinations, most of them
nonsense — what does canceled with cancel_at_period_end=true
mean? Worse, Stripe's webhooks arrive out of order:
invoice.paid can land before subscription.created. Your code
has to handle "active subscription with no customer record" because
the universe doesn't promise sequential delivery.
Background jobs. Sidekiq retries by default. A job runs, makes
three of four side effects, then crashes. Sidekiq retries. You've
sent two emails, charged the card twice, and the database row says
processing. The state space is (row state) × (side effects state) × (retry count) × (idempotency key state). Half your production
bugs are someone's job retrying through a partial write nobody
designed for.
Distributed systems. Every message that crosses a network can be
delivered once, delivered twice, delivered out of order, or — when
something goes wrong enough — silently lost. "At-least-once" is the
realistic guarantee from any queue, any webhook, any service RPC.
The consumer's state space is its own state crossed with every prior
message in {duplicate, reorder, drop}. Most systems handle one of
these axes and silently corrupt on the others. Idempotency keys,
dedup windows, sequence numbers, sagas with compensations — every
distributed-systems mechanism you've heard of is a way of collapsing
this combinatorial mess into a state machine a human can reason
about.
Feature flags are this same problem with the math made obvious. The fix is the same in every domain: shrink the representable state space to match the valid one.
Your job: shrink the state space
Every approach I've found boils down to the same thing: reduce the number of possible states. Not test more states. Not document the states. Eliminate them. Ruthlessly.
There are two big ideas here, both from the functional programming world, both older than most of us:
Make illegal states unrepresentable — structure your types so that invalid combinations cannot be expressed. If a state shouldn't exist, the compiler shouldn't let you construct it.
Parse, don't validate — instead of checking data and hoping you remember to check everywhere, transform it into a type that proves it's valid. Do it once, at the boundary. After that, invalid data can't exist.
Same principle, different angles. And both of them are shockingly practical in modern languages — including the dynamic ones.
Kill the booleans
The classic example. You have an order with three boolean fields:
interface Order {
isDraft: boolean;
isPublished: boolean;
isArchived: boolean;
}
Three booleans. Eight possible states. But only three are valid: draft,
published, archived. The other five — isDraft && isPublished,
isPublished && isArchived && isDraft, etc. — are nonsense. But your
code has to deal with them because the type system says they exist.
The fix is a discriminated union — a type that says "this value is exactly one of these shapes, never a mix." Some languages call it a "tagged union," "sum type," or "variant." Same idea everywhere:
// TypeScript
type Order =
| { status: "draft"; content: string }
| { status: "published"; content: string; publishedAt: Date }
| { status: "archived"; content: string; archivedAt: Date };
# Python
@dataclass(frozen=True)
class DraftOrder:
status: Literal["draft"]
content: str
@dataclass(frozen=True)
class PublishedOrder:
status: Literal["published"]
content: str
published_at: datetime
@dataclass(frozen=True)
class ArchivedOrder:
status: Literal["archived"]
content: str
archived_at: datetime
Order = DraftOrder | PublishedOrder | ArchivedOrder
# Ruby (Sorbet)
module Order
extend T::Helpers
sealed!
class Draft < T::Struct
include Order
const :content, String
end
class Published < T::Struct
include Order
const :content, String
const :published_at, Time
end
class Archived < T::Struct
include Order
const :content, String
const :archived_at, Time
end
end
Eight states → three. publishedAt only exists when the order is published.
archivedAt only exists when it's archived. You can't construct a
published-and-draft order because there's no type for that. Not
"it's not allowed at runtime" — it does not exist as a type. The bad
states aren't rejected; they were never representable to begin with.
And when you switch on this, the compiler makes you handle every case:
// TypeScript — miss a case and it won't compile
function render(order: Order) {
switch (order.status) {
case "draft": return renderDraft(order.content);
case "published": return renderPublished(order.content, order.publishedAt);
case "archived": return renderArchived(order.content, order.archivedAt);
}
}
# Python — assert_never catches missing cases at type-check time
def render(order: Order) -> str:
match order:
case DraftOrder(content=c):
return render_draft(c)
case PublishedOrder(content=c, published_at=t):
return render_published(c, t)
case ArchivedOrder(content=c, archived_at=t):
return render_archived(c, t)
case _ as unreachable:
assert_never(unreachable)
# Ruby (Sorbet) — T.absurd errors at typecheck AND runtime
sig {params(order: Order).returns(String)}
def render(order)
case order
when Order::Draft then render_draft(order.content)
when Order::Published then render_published(order.content, order.published_at)
when Order::Archived then render_archived(order.content, order.archived_at)
else T.absurd(order)
end
end
And the same idea works at the database level. SQL is weaker than a type system, but it's enforced on every write, by every client, forever:
-- PostgreSQL — the database refuses to store illegal states
CREATE TABLE orders (
id UUID PRIMARY KEY,
status TEXT NOT NULL CHECK (status IN ('draft', 'published', 'archived')),
content TEXT NOT NULL,
published_at TIMESTAMPTZ,
archived_at TIMESTAMPTZ,
-- Same invariant as the discriminated union, at the persistence layer.
-- Note the archived branch drops `published_at` — matching the TS
-- variant where archived doesn't carry publishedAt. If you want to
-- preserve publish history through archive, model it explicitly.
CHECK (
(status = 'draft' AND published_at IS NULL AND archived_at IS NULL) OR
(status = 'published' AND published_at IS NOT NULL AND archived_at IS NULL) OR
(status = 'archived' AND published_at IS NULL AND archived_at IS NOT NULL)
)
);
Less elegant. Less expressive. But your TypeScript types stop at the network
boundary, and your database doesn't. Even if some service in another language
forgets the invariant, the CHECK constraint catches it. We'll come back to
this.
Add a fourth status? The compiler tells you every place that needs updating. Not "please remember to check everywhere." The code won't build until you do.
Same trick, applied to the running example
The same idea works on whatever multiplicative thing you're modeling. To close the loop on the running example: instead of N independent feature flag booleans, define the valid configurations as a union.
// Before: 4 booleans = 16 possible states
const newUi = client.boolVariation("new-ui", false);
const newPayments = client.boolVariation("new-payments", false);
const fullRedesign = client.boolVariation("full-redesign", false);
const layout = client.stringVariation("layout", "single-page");
// What does newPayments=true + newUi=false even mean?
// What about fullRedesign=true + newUi=true?
// Nobody knows. Nobody tested it.
// After: 4 valid configurations
type CheckoutConfig =
| { variant: "control" }
| { variant: "new-ui"; layout: "single-page" | "multi-step" }
| { variant: "new-ui-with-payments";
layout: "single-page" | "multi-step";
provider: "stripe" | "square" }
| { variant: "full-redesign" };
Sixteen states → four. The provider field only exists on the variant that
uses it. layout doesn't exist on control. You can't accidentally check
for config.provider when the user is in the control group — TypeScript
won't let you.
I built a small library around this.
One resolve function maps raw flag values into a typed config. One place.
One conversion. After that, the type system does the rest.
const checkout = createFlagGroup<CheckoutConfig>({
resolve: (raw) => {
if (flag(raw, "full-redesign")) return { variant: "full-redesign" };
if (flag(raw, "new-ui") && flag(raw, "new-payments")) {
return {
variant: "new-ui-with-payments",
layout: flagStr(raw, "checkout-layout", "single-page"),
provider: flagStr(raw, "payment-provider", "stripe"),
};
}
if (flag(raw, "new-ui")) {
return { variant: "new-ui", layout: flagStr(raw, "checkout-layout", "single-page") };
}
return { variant: "control" };
},
default: { variant: "control" },
});
// Exhaustive — won't compile if you miss a variant
const html = checkout.match(rawFlags, {
control: () => renderLegacy(),
"new-ui": ({ layout }) => renderNewUi(layout),
"new-ui-with-payments": ({ layout, provider }) => renderFull(layout, provider),
"full-redesign": () => renderRedesign(),
});
Parse, don't validate. At the boundary. Once.
But types are not enough on their own. The class of things types can catch is large — but not infinite. So.
When types aren't enough
Not everything fits in a type. Sometimes the state space is genuinely large and you can't shrink it further; sometimes the business won't let you. There's a small ladder of tools to climb when you've hit the limit of what types alone can do.
State machines. A state machine is exactly what it sounds like: a list
of named states and the transitions allowed between them. Instead of
implicit transitions ("set isPublished to true and isDraft to false and
hope nobody forgets to also clear isArchived"), you declare the valid
transitions explicitly. A draft can become published. A published order can
be archived. An archived order cannot become a draft.
# Ruby with AASM
class Order < ApplicationRecord
include AASM
aasm do
state :draft, initial: true
state :published, :archived
event(:publish) { transitions from: :draft, to: :published }
event(:archive) { transitions from: [:draft, :published], to: :archived }
# Note: no transition back from :archived. Once archived, terminal.
end
end
order.publish! # OK
order.archive! # OK from draft or published
archived_order.publish! # AASM::InvalidTransition — refused at runtime
// TypeScript with XState
import { setup } from "xstate";
const orderMachine = setup({}).createMachine({
initial: "draft",
states: {
draft: { on: { PUBLISH: "published", ARCHIVE: "archived" } },
published: { on: { ARCHIVE: "archived" } },
archived: { type: "final" },
},
});
The machine becomes the documentation and the enforcement. The set of reachable states is finite, declared in one place, and any transition not in the table is refused. The graph itself is the type.
When your machine outgrows a flat list of states, the formalism extends. David Harel's statecharts (1987) — the model XState's nested and parallel states descend from — handle hierarchy and concurrency without the cartesian-product blow-up of naive state machines.
Property-based testing. Property-based testing flips the usual testing script. Instead of writing "given input X, expect output Y," you declare a property — a rule that should always hold no matter the input — and let the framework generate hundreds of random inputs trying to break it. Libraries like fast-check (TypeScript), Hypothesis (Python), and PropCheck (Ruby) do this. You're not verifying the full state space — you're sampling it intelligently, which is still infinitely better than testing three hand-picked cases and calling it a day.
Model checking. This is the heaviest tier. A model checker is a tool that takes a precise description of your system — every state it can be in, every transition that can happen — and exhaustively explores all of them. Not random samples. Every reachable state, by construction. If your invariant ever fails, the checker hands you back the exact sequence of steps that broke it.
The big names are TLA+ (with its checker TLC), Alloy, and Spin. The vocabulary sounds academic until you notice who uses it. AWS has been running TLA+ on S3 and DynamoDB since 2011. Azure verifies Cosmos DB the same way. MongoDB does it for replication. seL4 — a microkernel proved correct down to the assembly — runs in military UAVs. The crypto in your Firefox browser is verified by Microsoft Research's Project Everest, which ships proofs alongside C code. None of these are research prototypes. They're production systems, running right now, that you depend on.
The entry barrier is one Java JAR. (And as you'll see in Companion
tools below, for the FSM-shaped slice of this — terminal-state
preservation reduced to one-step inductive safety — even that JAR
becomes optional: Z3, which TLAPS uses as one of its SMT backends
alongside Zenon and Isabelle, ships inside an npm / gem
install. The educational arc stays TLA+; the plumbing for that
specific check is different.)
Here's our Todo state machine — same example we've been using — as a TLA+ spec. Four pieces of vocabulary if you haven't seen TLA+ before:
/\means and.- A prime mark like
status'means the value ofstatusin the next state. \inmeans element of.- The shape
[][P]_vmeans "always ([]) — in every step of every behavior — the inner conditionPholds, considering only steps that may changev." Think of it as a temporalforall: "for every step, this is true." The_statuspart is the variable the property is about.
The rest reads like predicate logic.
------------------------------ MODULE Todo ------------------------------
VARIABLES status
States == {"active", "done", "archived", "deleted"}
Init == status = "active"
Complete == status = "active" /\ status' = "done"
Archive == status \in {"active", "done"} /\ status' = "archived"
Delete == status' = "deleted" \* delete works from any state
Next == Complete \/ Archive \/ Delete
\* The property we want: once deleted, always deleted. This is a
\* temporal property (about behaviors over time), not a state
\* invariant — so it goes under PROPERTY in the .cfg, not INVARIANT.
DeletedIsTerminal == [][status = "deleted" => status' = "deleted"]_status
Spec == Init /\ [][Next]_status
=========================================================================
with a one-line Todo.cfg:
SPECIFICATION Spec
PROPERTY DeletedIsTerminal
Run TLC:
Model checking completed. No error has been found.
8 states generated, 4 distinct states found, 0 states left on queue.
The depth of the complete state graph search is 2.
Four reachable states. The exact four. The model checker walked every
path from active and confirmed DeletedIsTerminal holds across all
of them. Not "I tested this." Checked over the full reachable state
space. (For specs this small, "checked" and "proved" coincide — TLC
explored every reachable state. For larger specs you'd bound the
constants, and "checked" means "checked up to that bound." Worth saying
out loud since the word proved gets thrown around loosely in this
space.)
Now imagine a teammate adds an action to support "undo delete":
Restore == status' = "active"
Next == Complete \/ Archive \/ Delete \/ Restore \* <-- bug introduced
Looks innocent. Compiles fine in any language. A property test might miss it if the random generator doesn't happen to delete-then-restore in sequence. TLC catches it instantly — here's the actual output:
Error: Action property DeletedIsTerminal is violated.
Error: The behavior up to this point is:
State 1: <Initial predicate>
status = "active"
State 2: <Delete line 16, col 13 to line 16, col 31 of module TodoBuggy>
status = "deleted"
State 3: <Restore line 20, col 13 to line 20, col 30 of module TodoBuggy>
status = "active"
Three states. The exact minimal sequence. The exact line and column of the action that broke it. Model checkers don't return "test failed" — they return a movie of how the system breaks.
For a Todo app, this is overkill. Discriminated unions already prevent 99% of what a model checker would catch in your CRUD code. But for a payment retry protocol, a state machine spanning microservices, a replication algorithm, a permissioning model — this is the tool running the systems your bank account depends on.
The one catch: somebody has to write the TLA+ spec. For a brand-new
distributed protocol that's fine; you'd be designing the model anyway.
For an existing codebase, it's a chore. We come back to this in
Companion tools below — but with a narrower scope than the TLC
demo here. The tools extract a finite-state transition graph from
your code, reduce "once X, always X" to an inductive single-step
safety check, and dispatch that to Z3. They don't refute liveness
formulas; for unbounded []<> properties you'd still want TLC or
TLAPS via --export-tla. The trade is honest: a smaller class of
properties for a one-step install.
Invariants, invariants, invariants
Zoom out for a second. What are we actually doing?
Every technique in this article — discriminated unions, state machines, property tests, even the typed feature flag library — is a way of enforcing invariants. Things that must be true. Conditions your data, your state, your system cannot violate.
That's the whole game. Bugs are violated invariants. Crashes are violated invariants. Security holes are violated invariants. Whenever something goes wrong, you can describe it as "this thing was supposed to be true and wasn't."
The question isn't whether to enforce invariants. You're enforcing them either way — implicitly through code that assumes them, or explicitly through some mechanism. The question is where on the spectrum your enforcement lives. From weakest to strongest:
| Mechanism | When checked | What happens on violation |
|---|---|---|
| Comments / docs | Never | Nothing |
| Runtime assert | At call site, sometimes | Crash, hopefully in dev |
| Tests | At CI time | Build fails (for cases you wrote) |
| Linters | At lint time | PR fails (for patterns you encoded) |
| TypeScript types | At compile time, erased at runtime | Build fails, but can be bypassed with as any and stops at the network boundary |
| Strong types (Rust, Haskell) | At compile time, harder to bypass | Build fails earlier and unsafe is opt-in rather than the escape hatch |
| Schema validation (Zod, Pydantic) — a library that checks incoming data against a declared shape | At system boundary, at runtime | Reject input; on success, the type carries the invariant forward |
| Database constraints | At every write, by every client, forever | INSERT/UPDATE rejected — the only layer no application can bypass |
The thing nobody emphasizes enough: the database is one of your strongest invariant-enforcement layers, and most teams underuse it.
NOT NULL eliminates a state. UNIQUE eliminates a class of duplicate
states. FOREIGN KEY eliminates orphan-reference states. CHECK (status IN ('active', 'done', 'archived')) turns an unbounded text field
into a 3-state enum at the persistence layer. These are runtime checks,
sure, but they're checked on every single write, by every single client
that talks to your database, forever, regardless of which application
forgot.
(One Postgres footnote on UNIQUE: by default it treats NULL as
distinct, so UNIQUE (email) still lets you store any number of rows
with NULL email — easy to miss when you're using UNIQUE to enforce
"at most one of these." Postgres 15+ added UNIQUE NULLS NOT DISTINCT
to make NULL collide with itself. Worth knowing exists.)
Two Postgres features in particular deserve more than a passing mention, because they're the ones most application teams skip.
A partial unique index is the idiomatic way to enforce state-machine cardinality at the persistence layer:
CREATE UNIQUE INDEX one_active_subscription_per_customer
ON subscriptions (customer_id)
WHERE status = 'active';
After this, the database itself refuses to ever store two simultaneously active subscriptions for the same customer. No race condition, no application bug, no service in another language can produce that state.
An EXCLUDE constraint generalizes the same idea to ranges and
overlaps. Imagine room bookings:
ALTER TABLE bookings ADD CONSTRAINT no_overlap
EXCLUDE USING gist (
room_id WITH =,
during WITH &&
);
Every booking system reimplements "no double-booked rooms" in application code, badly, with race conditions the database just refused to permit.
That's stronger than your TypeScript types in one specific way: TypeScript
gives up the moment data crosses a boundary. Your Order type guarantees
nothing about the JSON the API just received, the row the database just
loaded, the message that just arrived from Kafka. Schema validation at the
boundary helps. Database constraints help even more — they're enforced even
when your validation is wrong.
Yes, DB constraints are weaker than a type system in expressiveness. You
can't encode "this field's value depends on another field's type" in SQL
the way you can in a TypeScript discriminated union. But they're stronger
in guarantee. The type system can be bypassed with any or as unknown as. The database constraint cannot.
The point isn't to pick one mechanism. It's to stack them:
- Types catch the cases you can express in the type system, instantly, at compile time
- Schema validation catches what crosses the boundary
- Database constraints catch what gets persisted
- Tests catch what you remembered to write
- Linters catch the patterns you've codified
- Property tests catch the cases you didn't think of
Every layer reduces the state space a bit. Together they pin the system down to something close to "only valid states are reachable."
Parse, don't validate: the bridge between runtime and compile time
There's one pattern that ties all of this together, and it deserves its own mention. Alexis King wrote about it in 2019, and once you see it, you can't unsee it.
The naive approach to invariants is to check them everywhere. The function that creates an order validates. The function that updates an order re-validates. The function that renders an order checks again, just to be safe. Every layer of your code carries the suspicion that the data might be malformed, and every layer pays the runtime cost of confirming it isn't. This is what validation looks like.
Parsing flips it. You check the invariant once, at the boundary where data enters your system. On success, you don't return a boolean — you return a type that the invariant is built into. Everything downstream gets the guaranteed-valid value, with the type system as proof.
// Validation: check, hope, repeat everywhere
function isValidOrder(input: unknown): boolean { /* ... */ }
function processOrder(input: unknown) {
if (!isValidOrder(input)) throw new Error("bad");
// input is still `unknown`. You'll check again. And again.
}
// Parsing: check once, type carries it forever
const result = OrderSchema.safeParse(rawJson);
if (result.success) {
const order: Order = result.data; // ← invariant now lives in the type
processOrder(order); // No checks downstream. Compiler enforces.
}
Zod, Pydantic, io-ts, Valibot, JSON Schema with code generation — all of these implement the same trick. Runtime invariant goes in. Type-level invariant comes out. The bridge from Strategy 2 to Strategy 1.
This is the most underrated pattern in modern application architecture. It's not just "good hygiene" — it's a fundamental conversion. You're moving the invariant from "code that happens to remember to check" to "type system that can't forget." Pay the runtime cost once. Get compile-time guarantees for the rest of the program's life.
If you've ever wondered why TypeScript shops obsess over Zod, this is why.
Every team I've worked with has a version of this story: a database
migration written years after the table was created, adding the
constraints that "should have been there on day one." A CHECK here.
A foreign key there. Conditions that turn out to be "obviously true"
except for the few thousand rows where they aren't, which you clean
up before the constraint goes on. A few weeks of work. Months of bugs
that suddenly stop happening.
Pick your favorite invariant. Encode it somewhere stronger than a code comment. Repeat until your weekends stop being interrupted.
A practical algorithm
You're designing a new type, a new schema, a new endpoint. You want to be deliberate about where its invariants live. Here's the order I work through now:
1. Can I tighten the type itself until illegal states don't exist?
Discriminated union? Literal union instead of string? Branded type instead
of bare number? If yes, do it. This is the strongest, cheapest, fastest
enforcement available. The compiler does the work, runtime cost is zero,
and you can't forget. Most invariants you've been validating in code
belong here.
2. What invariants enter from outside? API request bodies, database rows, message queue payloads, user form input — anything from beyond your trust boundary. These need parsing, not validation. Schema at the door. After it, the type carries the invariant. Zod, Pydantic, io-ts, or whatever your language offers. One parse, no re-checks downstream.
3. What invariants does persistence need to enforce?
The database is the longest-lived part of your system. It outlives services,
languages, and entire teams. NOT NULL, UNIQUE, FOREIGN KEY,
CHECK, EXCLUSION — every constraint here is a guarantee that survives
every code change forever. If an invariant is important enough to enforce,
ask whether it belongs in SQL.
4. What invariants are genuinely runtime? Authorization rules, aggregate properties, time-dependent state, anything that depends on data your type system can't see. Encode as assertions, contract checks, or service-layer validation. Yes, this is "regular code." But now it's the small remaining surface — not your whole codebase.
5. What invariants can't even be checked statically? Property-based tests. Generate a thousand random inputs and check that your invariants hold across all of them. You won't enumerate the full state space, but you'll find edge cases you would never have written by hand.
Each step catches a different class of bug. Skip a step and the cases it would have caught either leak into production or pile up at a later layer that's more expensive to repair. Stack them in this order and you build software that genuinely refuses to be wrong.
A note on TypeScript
If you happen to be a TypeScript developer, this whole article is the
case for one specific feature of the language. The discriminated union
pattern I've been using is first-class in TypeScript. The compiler
narrows types automatically in if blocks, switch statements,
ternaries — everywhere. You write three lines per variant and the
compiler does the rest.
Python can do this with dataclass + Literal + assert_never and
it works, but it's more ceremony. Ruby with Sorbet uses sealed! and
T.absurd and that's roughly three times the boilerplate. Java and
C# need runtime checks. (OCaml, Haskell, Rust, F# — the ML-family
languages — invented this pattern; if you're already there you've
been using it since school. The pitch is for the mainstream languages
that grew it later.) Among those, TypeScript is the one where this
pattern is cheapest.
If you're already writing TypeScript and you're not using discriminated unions everywhere, you're leaving the most powerful feature of the language on the table.
The political problem
I've given you types, state machines, property tests, a feature flag library. All useful. None of them solve the actual problem.
The actual problem is that someone in a meeting says "what if we let users decide?" and nobody in the room does the math on what that sentence costs in state space. The actual problem is that the PM wants "just one more option" and doesn't realize that one option doubles the configurations QA needs to verify. The actual problem is every system whose flexibility was its selling point and ended up being its downfall.
Every optional feature multiplies states. Every configuration toggle doubles them. Every "let the user decide" is an invitation for combinatorial explosion.
Your job as a programmer — the part they don't teach you, the part that isn't writing code — is to push back. Not because you're lazy. Not because you don't want to build things. Because you're the only person in the room who understands that complexity isn't linear. Two features don't make a system twice as complex. They make it four times as complex. Three features? Eight times. And nobody notices until it's too late, because each feature looks simple on its own.
This isn't unique to software. Every profession that's been around long enough has developed the same immune system against requested complexity, and the people good at it get paid for the no as much as the yes.
Editors. Writers turn in manuscripts twice as long as they should be. A good editor cuts a third of it — usually the third the writer is most proud of. Kill your darlings, the rule goes. Not because the darling sentences are bad. Because they were expensive: each one made the rest of the book harder to read. The editor isn't a writer who can't write. The editor is a writer with the discipline to remove.
Doctors. A patient walks in asking for the antibiotic they Googled. A good doctor refuses most of those requests, because each prescription carries its own state explosion — side effects, drug interactions, downstream interventions, false positives if it's a test. The good ones explain why. The bad ones acquiesce.
In both cases the professional knows something the client doesn't: every addition has invisible costs, and the cost compounds. Programmers do the same job in a different vocabulary. PMs, designers, customers, sales — everyone around you has a structural incentive to add. Yours is to cut, and to back it up with the math.
The pattern even shows up in product strategy. Steve Jobs came back to Apple in 1997, killed twelve-plus product lines down to a 2×2 grid of four (consumer × pro × desktop × portable), and the company stopped dying. He didn't add capability. He removed states. You can do the equivalent with a discriminated union and save a codebase instead of a company.
Be the person who does the math. Be the person who says "we can support this, but here's what it costs in state space." Be the person who proposes a simpler model that covers 90% of use cases without the combinatorial tax.
The data migration is cheaper than what comes after. Cheaper than the bugs you can't reproduce because you can't reconstruct the state. Cheaper than the customer escalations from configurations nobody anticipated. Cheaper than the team you eventually staff to answer the question "what is this customer's setup even supposed to look like?"
Simple systems are not a luxury. They're a survival strategy.
One more reframe (skip if you just want the tools)
Pull back for a second. Everything in this article is the same move.
Discriminated unions remove illegal states from the type. Parse-don't-validate checks at the boundary and carries the proof. DB constraints refuse to persist anything that doesn't fit. Even Jobs' 2×2 grid: figure out which configurations are valid, design so only those can exist.
There's a name for that gap: entropy. In information theory, the
entropy of a random variable X with possible outcomes occurring with
probabilities p₁, p₂, …, pₙ is
H(X) = − Σ pᵢ log₂ pᵢ
— the average number of bits needed to encode one outcome.
For our case X is the state of a single record. The honest worst-case
assumption is uniform probability across the representable state space;
real distributions are skewed (illegal states are usually rarer than
legal ones in practice), which only makes actual entropy smaller — so
uniform is a conservative upper bound. Under uniform pᵢ = 1/R the
formula collapses to log₂ R, where R is the cardinality of the type.
The entropy you'd want — what you'd have if only valid states could
exist — is log₂ V. The gap between them is the redundancy of the
representation:
log₂(R) − log₂(V) = log₂(R / V) bits per record
For the user model from earlier — isAdmin, isVerified, isBanned,
isDeleted — that's R = 16 representable states and roughly V = 4
meaningful ones, giving log₂(16 / 4) = 2 bits per record of
redundancy. Across a million users, your data shape carries about 2
million extra bits of capacity beyond what the domain actually wants.
One careful framing, because the word entropy gets thrown around
loosely. log₂(R / V) measures redundancy — extra representational
room — not "bug surface." The probability of actually exercising a bad
state depends on your traffic, your tests, your luck. What we can say
precisely is this: every technique in this article shrinks R toward V
without touching V. Each variant you remove from a discriminated union,
each CHECK constraint you add, each state your machine refuses to
enter — all of them eliminate representable configurations while
leaving the valid ones untouched. The redundancy shrinks. The system
carries less "wrong" inside it.
You don't need any of this to ship better code. But it's worth naming because once you see it you see it everywhere. Programming is an uphill battle against entropy. Ordered systems don't stay ordered by themselves. State spaces don't shrink unless you shrink them. Codebases don't simplify unless someone does the work. Every technique in this article is a way of pushing back against the slope — one bit at a time.
Companion tools
I built three small things while writing this article. All are MVPs (say so loudly), all on GitHub, all came directly out of the patterns above. The honest scope of the verification ones is narrow: they reduce "once X, always X" to a one-step inductive safety check — "given any transition out of state X, prove the successor is still X" — and dispatch that to Z3, the SMT solver Microsoft Research maintains. That's a small slice of what TLA+ as a whole can do (no liveness, no refinement, no unbounded counters), but it's the slice most production state machines actually need.
The install plumbing is what changed: Z3 ships inside the
package. For the Ruby gem that's a vendored libz3.so / .dylib
/ .dll (~30 MB) inside a platform-specific precompiled gem,
the same pattern nokogiri and sqlite3 use. For the npm package
it's the official z3-solver WebAssembly build (~5 MB), one
artifact for every platform. No Java, no tla2tools.jar, no
external shell-out. The full TLA+ path is still there for the
hard cases (liveness, refinement, unbounded N), behind a
--export-tla flag that emits the classic .tla / .cfg for
TLC or TLAPS.
-
verify_states — Ruby gem. Extracts a backend-neutral
Definitionfrom an existing state machine declaration and dispatches each inferred invariant to Z3 via FFI. Supported backends today:aasm(97.7M cumulative downloads) andstate_machines— together about 99% of verifiable Ruby state-machine usage by downloads.stateful_enum(1.5M downloads) is on the roadmap. Bare-enum models with transitions scattered across service objects are out of scope by design — no transition graph in the source means no structural invariants to extract. Verified-or- counterexample, in process, no shell-out. The classic bug class:$ verify-states OrderBuggy --require app/models/order_buggy.rb \ --assert-terminal refunded Definition: OrderBuggy states: pending, paid, shipped, delivered, refunded invariants: 1 (1 asserted) ✗ RefundedIsTerminal — Asserted: once refunded, always refunded. counterexample: kind = "refunded" kind_next = "pending" 1 violation(s), 0 unknown(s), 0 error(s).Exit code 2 — CI catches the bug the day someone adds a
reactivateevent that re-opens refunded orders. -
verified-states — TypeScript / React companion. Same idea, different frontend: lift a React component's props (annotated with the
ComponentProps<TStates, TInitial, TTransitions>generic) into aDefinition, infer invariants, dispatch to Z3 via WebAssembly. Also reads XStatecreateMachine({...})directly (flat / hierarchical / parallel), and composes multiple components into a joint spec when their state shares context. Same--assert-terminaland--export-tlaflags as the gem; the two halves produce identical verdicts for identical Definitions. -
eslint-plugin-verified-states — the structural-consistency check for
ComponentProps<...>annotations. Opt-in by design: silent on components that don't use the generic. When you do, it catches three classes the type checker itself can't see — transition targets that aren't in the state union, initial states that aren't either, declared states with no entry in the transition table (the "forgotten branch after a refactor" case). Composes with@typescript-eslint's strict preset; we don't re-implementno-explicit-anypatterns the upstream rules already cover.
The pattern generalizes — same Definition IR, new extractor per
frontend. django-fsm, Go state-machine libraries, Robot3, Zustand —
all shaped the same way. The cost of supporting a new source is one
extractor file plus tests, ~300 LOC.
All three are intentionally small. The point was never the perfect tool — it's to make these ideas land in your editor before you write the next boolean.
Because every boolean you write today is a Friday night you won't get back.
References
- Alexis King, Parse, Don't Validate (2019)
- Yaron Minsky, Effective ML (Jane Street, Effective ML talks at CUFP mid-2000s and the "Effective ML Revisited" blog post) — the OCaml-community version of "make illegal states unrepresentable," predating the F# write-up by a decade
- Scott Wlaschin, Making Illegal States Unrepresentable (F# for Fun and Profit) — the popular F#/TypeScript-era restatement
- Chris Krycho, Making Illegal States Unrepresentable in TypeScript
- David Harel, Statecharts: A Visual Formalism for Complex Systems (Science of Computer Programming, 1987) — the source XState's hierarchical and parallel states descend from
- Doug Seven, Knight Capital — A DevOps Cautionary Tale
- Michael Barr, Toyota's Killer Firmware
- Unleash, When Feature Flags Interact
- Microservices: Combinatorial Explosion of Versions, 5 Years Later
- Hillel Wayne, Learn TLA+ — the modern on-ramp
- Chris Newcombe et al., How Amazon Web Services Uses Formal Methods (CACM 2015)
- Leslie Lamport, The TLA+ Home Page
- DeepMind, AlphaProof: AI solves IMO problems at silver medal level (2024)
- Claude E. Shannon, A Mathematical Theory of Communication (Bell System Technical Journal, 1948) — the origin of the entropy framing
- Andrew Hunt & David Thomas, The Pragmatic Programmer (1999) — software rot and the broken windows metaphor
- James A. Whittaker, The Plague of Entropy (Google Testing Blog, 2009)
- Kumari et al., Prioritization of Software Bugs Using Entropy-Based Measures (Journal of Software: Evolution and Process, 2025) — recent peer-reviewed use of entropy as a software-engineering metric