Theory Without (Proper) Measurement
In empirical macroeconomics, there is a well-established way of doing things. One first establishes a reduced-form fact using VARs, local projections, difference-in-differences, or related methods. This delivers a point estimate: an impulse response, a multiplier, a treatment effect. The estimate is then subjected to the usual battery of robustness checks: alternative controls, lag structures, samples, sometimes estimators. Once the number survives this ritual, the conversation shifts.
The next step is interpretation.
Most often, interpretation means turning directly to theory. A structural model—frequently a New Keynesian DSGE—is written down, and we ask whether it can reproduce the reduced-form estimate. If it can, the mechanism embedded in the model is taken as a plausible explanation for the empirical fact. This workflow has deep roots in macroeconomics, going back at least to early RBC models that sought to match moments.
What is rarely made explicit is how much this approach relies on an implicit assumption: that the reduced-form estimate reflects a recurring economic mechanism rather than being driven by a small number of historically specific episodes.
Koopmans famously criticized measurement without theory. The reverse problem has received far less attention.
Interpretation Without Jumping Straight to Theory
A VAR impulse response or a state-dependent local projection coefficient is often treated as an average law of motion, something the economy produces repeatedly across time and states of the world. But mechanically, the estimate is an aggregation of historical observations, filtered through a particular estimator.
In other areas of econometrics and machine learning, interpretation is understood more broadly. It includes understanding how an estimate is constructed, which data points carry weight, and how different pieces of information are aggregated. In macro, by contrast, we often bypass this step and move straight from reduced-form evidence to structural explanation.
This gap matters, because what we ask theory to explain depends on what the reduced-form estimate actually represents.
Opening the Black Box of Local Projections
Local projections are a useful setting in which to make this issue concrete. Despite their popularity, they remain opaque objects. Once controls, lags, and normalizations are layered in, it becomes difficult to see how an estimate is assembled from the data. Which historical episodes matter? Which ones do not? Is the estimate supported by many moderate pieces of evidence, or by a handful of extreme ones?
In Opening the Black Box of Local Projections, we propose a simple decomposition: any local projection estimate can be written as a weighted sum of realized outcomes, with weights determined entirely by the estimator. Plotting the cumulative sum of these contributions over time yields what we call an evidence curve. By construction, this curve converges to the estimated coefficient. The path it takes to get there is informative.
When the curve evolves smoothly, drawing strength from many periods, the estimate looks like a broadly supported empirical regularity. When it moves in sharp jumps concentrated around a few dates, this raises a red flag.
Fiscal Multipliers and Historical Concentration
A clear illustration comes from the literature on fiscal multipliers identified using Ramey–Zubairy–style military spending shocks. This work—importantly, not the entire fiscal multiplier literature—often finds that government spending is substantially more effective in recessions than in expansions, a result that has been influential in both policy and theory.
When we decompose these estimates, a striking pattern emerges. In the recession case, the fiscal multiplier is overwhelmingly driven by a single historical episode: World War II. A very small fraction of observations accounts for the vast majority of the total contribution. In the expansion case, the estimate is only slightly less concentrated, drawing mainly on the later stages of World War II and the Korean War.
The estimates are mechanically correct, but their content is more specific than they are often taken to be. Rather than reflecting a pattern that recurs over a century of data, they summarize the macroeconomic consequences of a handful of extraordinary events. I’m not sure whether any DSGE out there can (or should) replicate that.
Evidence Curves as a Missing Middle Step, and Some AI Opportunities
This suggests a missing step between reduced-form estimation and theory. Before treating an estimate as a stylized fact, it is useful to understand how concentrated it is and which historical episodes are doing the work. Evidence curves provide a way of doing exactly that.
Once we know which episodes matter, the next question is what happened during them. Here, large-language models and AI tools can help. Given the historical weights underlying an estimate, they can systematically interrogate the specific months or quarters that drive the result: searching the web, policy archives, and historical records for events directly related to the research question. This adds narrative context at a scale that is infeasible manually, helping to determine whether the estimate reflects the mechanism of interest or the influence of confounding episodes and missing variables.