How a CausalTrader report is built.
Every report is the output of a deterministic five-step pipeline: collect signals, test causality among them, attribute news effects, classify regime, and write the markets note. This page covers the math.
PCMCI+ — finding what causes what
PCMCI+ (Runge et al., 2019) is a constraint-based causal discovery algorithm built for time series. We feed it a daily panel of signals — price-based technical indicators, news flow features, sentiment scores, and macro proxies — and ask it which variables causally influence which other variables at lags up to five trading days. For every candidate edge, PCMCI+ tests conditional independence after conditioning on every other variable in the system across every lag, then applies a momentary-conditional-independence (MCI) step to control false-positive rates.
The output is a directed graph of statistically significant causal relationships at a chosen significance level (default α = 0.05). Edges that survive the filter are the candidate causal relationships we report on. The absence of an edge is informative too: it means that, after controlling for everything else, the algorithm rejected the relationship.
Dual-window architecture — short news, long causality
PCMCI+ needs a few hundred rows of aligned data to test conditional independence reliably; a 30-day customer window only yields ~21 trading days, which is too thin for the causal layer to resolve stable structure. We decouple the two halves of the report. News, sentiment, technicals, and the front-page statistic strip stay scoped to your selected analysis window because those are short-horizon phenomena and the reader is asking about them. The causal-discovery layer runs on a fixed 180-day context window that ends on the same trading day, giving PCMCI+ roughly 125 trading days of panel data to work with. Newly listed tickers without 180 days of history fall back to the available bars; the graph caption discloses the truncation when this happens. The two windows are reported side-by-side on the graph page and in the About box so the reader sees exactly which panel each part of the analysis came from.
VARLiNGAM — attributing news to returns
VARLiNGAM (Hyvärinen et al.) extends the LiNGAM family to vector autoregressive systems. It assumes linear relationships among variables with non-Gaussian residuals and acyclic causal ordering, and it identifies signed causal effects in addition to a full topological ordering of the variables. We fit it to a daily panel of news-derived features (sentiment, magnitude, source, headline counts) and market-model abnormal returns.
From the fitted VAR coefficient matrix we attribute each headline a portion of its day's estimated effect in basis points. These are attributions under the estimated daily DAG, not independent per-headline causal effects.
Abnormal returns — stripping out the index
Before VARLiNGAM, we adjust raw returns to remove the part explained by the broad market. We fit a standard market model against SPY on a rolling window: rt = α + β · rSPY,t + εt, where the abnormal return is the residual εt. When the window is too short for a stable market-model fit, we fall back to raw returns and flag that in the report.
Cluster-bootstrap confidence intervals
VARLiNGAM coefficients and per-headline effects are reported with 95% confidence intervals derived from a cluster bootstrap. We resample by week (not by day) to preserve short-horizon autocorrelation and clustered news arrival. Default sample count is 500. CIs that exclude zero are visually flagged in the report.
Benjamini-Hochberg FDR control
Running many independent significance tests inflates false discovery rates. We apply the Benjamini-Hochberg procedure to the VARLiNGAM coefficient p-values to control the false discovery rate at a configurable threshold (default q < 0.10). Effects that clear the threshold are marked as having passed FDR.
Regime classification
We segment the analysis window into bull, bear, and sideways regimes using a 21-day rolling assessment of trend, volatility, and drawdown. The technicals page shades the price panel by regime, and the front-page statistic strip reports the percentage of the period spent in each state.
What the report does not do
CausalTrader does not issue buy / hold / sell recommendations. It does not size positions. Causal relationships discovered in one window can break in the next — regime changes, earnings surprises, Fed announcements, or geopolitical events can scramble any historical pattern overnight. The full disclaimer spells out the model and data caveats.
References
- Runge, J., et al. (2019). Detecting and quantifying causal associations in large nonlinear time series datasets. Science Advances 5(11).
- Hyvärinen, A., Zhang, K., Shimizu, S., & Hoyer, P. O. Estimation of a structural vector autoregression model using non-Gaussianity. Journal of Machine Learning Research 11.
- Benjamini, Y. & Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. JRSS-B 57(1).
Questions
For methodology questions or audit requests: dennis@mercurial-ai.com.