Reading a causal graph: what your model knows

How directed acyclic graphs encode what causes what — and how the system uses them to answer "what if" questions correctly.

Two things can be correlated without either causing the other. Ice cream sales and drowning deaths both rise in summer — not because ice cream is dangerous, but because both respond to a third factor: heat. If you tried to reduce drownings by banning ice cream, you’d fail, because you’d be targeting a correlation, not a cause.

This matters because every “what if” question is really an intervention. You’re not asking “what tends to happen when hiring goes up?” — you’re asking “what happens to velocity if I decide to hire three engineers in Q3?” The difference is causal, and getting it wrong leads to confidently wrong decisions.

Causal graphs are how Simmis’s models encode the difference.

What a causal graph is

A directed acyclic graph (DAG) is a collection of nodes (variables) connected by directed edges (arrows) with no cycles. An arrow from A to B means: A causally influences B. Not just correlates — causes.

season(confounder)pricedemandrevenue(outcome)causescauses
A simple DAG. Season causes both price and demand (it’s a confounder). Price and demand both affect revenue. Arrows are directed — they encode causal direction, not just correlation.

Reading this graph: if you observe high revenue, you can’t immediately infer that price was high — because season could have caused both. But if you set price to a specific value (intervene on it), you break the path from season to price and can read off the causal effect directly.

Confounders: why correlation misleads

A confounder is a variable that causally influences both an apparent cause and an apparent effect. It creates a spurious correlation: X and Y move together, but not because X causes Y — they’re both downstream of Z.

heatice creamsalesdrowningratespurious correlation(no causal path)
Heat causes both ice cream sales and drowning rates. The correlation between the two is real but not causal — intervening on ice cream sales would have no effect on drownings.

This is ubiquitous in business data. “Teams that do more retrospectives ship faster” — but is that causal, or do both correlate with team maturity? “Higher marketing spend correlates with revenue” — but causally, or does revenue allow marketing spend? Causal graphs force you to specify your assumptions about which direction the arrows run.

Intervention: the do-operator

Judea Pearl introduced the do-operator to formalize the difference between observing and intervening. P(Y | X=x) is the probability of Y given that we observe X equals x. P(Y | do(X=x)) is the probability of Y given that we set X to x — regardless of whatever would have caused X.

Intervening on X is equivalent to cutting all incoming arrows to X in the graph, then setting its value. This removes the influence of confounders.

observing priceseasonpricedemanddo(price = high)seasonprice= highdemandseason no longer confoundsthe price → demand relationship
Left: observing price still allows season to confound the relationship with demand. Right: setting price via do(price=high) severs the season→price edge, isolating the true causal effect.

The do-operator is the formal difference between prediction and decision. When a system predicts “revenue will be high when we observe high prices”, it’s conditioning on an observation. When it says “revenue will increase if we raise prices”, it’s computing an interventional distribution — which requires a causal model, not just correlational data.

What your model actually encodes

When agents in Simmis build a model from your domain, they’re building a DAG. Nodes are the quantities you care about: headcount, velocity, burn rate, release dates, compliance review windows. Edges are causal claims derived from domain knowledge, historical data, and patterns in how your organization actually works.

hiretimingdo(Q3)interventiononboardingloadengvelocitycompliancereviewreleasedateoutcome
A domain model for the hiring question. Hire timing causally affects both onboarding load and velocity, but the release date is also a function of compliance review — a path that hire timing doesn’t touch.

This graph is why the system can tell you: “The bottleneck is the compliance review, not engineering capacity.” That answer comes from reading the causal graph — hire timing doesn’t have a direct edge to release date that bypasses compliance. Engineering velocity is relevant, but compliance review is a parallel bottleneck.

What the model doesn’t know A causal model can only encode the causal structure you give it — or that the system can infer from data and domain knowledge. Missing edges mean missing causal paths. The system makes its assumptions inspectable: you can see the graph, challenge edges, add domain knowledge, and watch how the inference changes. This is why visibility matters. A black-box model that ignores causation may give confident wrong answers. An explicit causal graph shows you exactly where it might be wrong.

Why this changes “what if”

Most forecasting tools operate on observational data: they find patterns in what happened and project forward. This works for prediction (“what will happen if things continue as they are?”) but breaks for intervention (“what will happen if I change X?”).

Answering interventional questions requires knowing the causal structure. Without it, you can predict but not decide. With it, you can compute the expected consequences of any action you’re considering — which is exactly what a decision-support system should do.

When you ask Simmis “what if we hire in Q3?”, the system is computing P(release date | do(hire timing = Q3)) — not P(release date | hire timing = Q3). The former is what you actually need to make the call.

The difference between those two quantities is causality. And causality is what the graph encodes.

Simmis is built on these ideas. We're in early access — come think it through with us.

Get in touch Try Simmis