Reading a causal graph: what your model knows
How directed acyclic graphs encode what causes what — and how the system uses them to answer "what if" questions correctly.
Simmis answers a question in three moves: model the domain, simulate scenarios on it, and learn from what happens. This note is the model step — how a model encodes cause, not just correlation. We follow one running decision throughout: should we hire three engineers in Q3 instead of Q4?
Two things can be correlated without either causing the other. Ice cream sales and drowning deaths both rise in summer — not because ice cream is dangerous, but because both respond to a third factor: heat. If you tried to reduce drownings by banning ice cream, you’d fail, because you’d be targeting a correlation, not a cause.
This matters because every “what if” question is really an intervention. You’re not asking “what tends to happen when hiring goes up?” — you’re asking “what happens to velocity if I decide to hire three engineers in Q3?” The difference is causal, and getting it wrong leads to confidently wrong decisions.
Causal graphs are how Simmis’s models encode the difference.
What a causal graph is
A directed acyclic graph (DAG) is a collection of nodes (variables) connected by directed edges (arrows) with no cycles. An arrow from A to B means: A causally influences B. Not just correlates — causes.
Reading this graph: if you observe high revenue, you can’t immediately infer that price was high — because season could have caused both. But if you set price to a specific value (intervene on it), you break the path from season to price and can read off the causal effect directly.
Confounders: why correlation misleads
A confounder is a variable that causally influences both an apparent cause and an apparent effect. It creates a spurious correlation: X and Y move together, but not because X causes Y — they’re both downstream of Z.
This is ubiquitous in business data. “Teams that do more retrospectives ship faster” — but is that causal, or do both correlate with team maturity? “Higher marketing spend correlates with revenue” — but causally, or does revenue allow marketing spend? Causal graphs force you to specify your assumptions about which direction the arrows run.
Intervention: the do-operator
Judea Pearl introduced the do-operator to formalize the difference between observing and intervening (Pearl, Causality, 2009; popularized in The Book of Why, 2018). P(Y | X=x) is the probability of Y given that we observe X equals x. P(Y | do(X=x)) is the probability of Y given that we set X to x — regardless of whatever would have caused X.
Intervening on X is equivalent to cutting all incoming arrows to X in the graph, then setting its value. This removes the influence of confounders.
The do-operator is the formal difference between prediction and decision. When a system predicts “revenue will be high when we observe high prices”, it’s conditioning on an observation. When it says “revenue will increase if we raise prices”, it’s computing an interventional distribution — which requires a causal model, not just correlational data.
The do-operator, in code
In Spindel the do-operator is a first-class effect, intervene! — it pins a node and severs its incoming edges, exactly as Pearl describes:
require('[org.replikativ.spindel.inference.effects :refer [observe intervene!]])
;; Two different questions about the same model:
;; observe — "given the claim succeeded, was the agreement on file?"
;; do(...) — "if we FILE the agreement, does the claim succeed?"
spin(intervene!([:evidence :loan-agreement] true) adjudicate(debt-test record))
(require '[org.replikativ.spindel.inference.effects :refer [observe intervene!]])
;; Two different questions about the same model:
;; observe — "given the claim succeeded, was the agreement on file?"
;; do(...) — "if we FILE the agreement, does the claim succeed?"
(spin
(intervene! [:evidence :loan-agreement] true) ; do(file the agreement)
(adjudicate debt-test record)) ; outcome under the action
This is the engine behind the legal dispute simulator: the “produce the signed loan agreement” button is a do() on the evidence, and the win-probability you watch move is the interventional distribution — not a correlation.
What your model actually encodes
When agents in Simmis build a model from your domain, they’re building a DAG. Nodes are the quantities you care about: headcount, velocity, burn rate, release dates, compliance review windows. Edges are causal claims derived from domain knowledge, historical data, and patterns in how your organization actually works.
This graph is why the system can tell you: “The bottleneck is the compliance review, not engineering capacity.” That answer comes from reading the causal graph — hire timing doesn’t have a direct edge to release date that bypasses compliance. Engineering velocity is relevant, but compliance review is a parallel bottleneck.
Why this changes “what if”
Most forecasting tools operate on observational data: they find patterns in what happened and project forward. This works for prediction (“what will happen if things continue as they are?”) but breaks for intervention (“what will happen if I change X?”).
Answering interventional questions requires knowing the causal structure. Without it, you can predict but not decide. With it, you can compute the expected consequences of any action you’re considering — which is exactly what a decision-support system should do.
When you ask Simmis “what if we hire in Q3?”, the system is computing P(release date | do(hire timing = Q3)) — not P(release date | hire timing = Q3). The former is what you actually need to make the call.
The difference between those two quantities is causality. And causality is what the graph encodes.
Further reading
- Pearl, Causality: Models, Reasoning, and Inference (2nd ed., 2009) — the formal treatment.
- Pearl & Mackenzie, The Book of Why (2018) — the accessible introduction.
- The do-operator as a runtime effect:
intervene!in Spindel (inference/effects.cljc).
Simmis is built on these ideas. We're in early access — come think it through with us.