Careful With That Axe: ML, Recursive Forecasting and IRFs
Most ML macro forecasting papers use direct forecasting because it is more natural and less troublesome. But there can be good reasons to consider iterated forecasts from a multivariate system, which can eventually be understood by its impulse response functions (IRFs). Efficiency can also be a big deal.
Let’s assume we are using modern nonlinear nonparametric methods (e.g., boosted trees, deep neural nets (NN), and random forest). Here are 3 main pitfalls that needs to be avoided to conduct decent forecasting and report reliable IRFs. Some are obvious and are well known from the nonlinear time series literature, and one is new and specific to ML-based approaches.
1. Iterated forecasts need to be obtained by simulations, or otherwise be biased. For instance, on how to do it, see this paper for classical methods, and this paper for multivariate ML-based iterated forecasts.
2. IRFs needs to be simulated too – unless one wants to go the local projection route (which will have its own issues in a ML context). One way is to start with (1), and make two expectations for Y_{t+h}, one that starts with a shock of one standard deviation, and the other 0. However, in a nonlinear environment, shocks may have different effects depending on current history (call it Y_{t}). A more suitable approach is to fully simulate the estimated system and throw 1 and 0 shocks in random places, keep track of those locations, and compute the difference of those two means h periods after impact. The other shocks should be sampled from those of the training sample, which brings us to the least known pitfall.
3. For most ML models, one should absolutely not use in-sample residuals to obtain the variance-covariance matrix. If not convinced, see my paper on RF being self-pruning (but not in-sample), this on Boosting, and this very popular piece for deep learning. Conveniently, by its use of Bagging, RF provides a natural substitute for in-sample residuals, namely out-of-bag (OOB) ones. Those are fair approximations to what one would get out-of-sample and can be used to compute the variance-covariance matrix. Problem: such OOB residuals are not readily available for Boosting nor NNs, because, well, there is no Bagging involved. This could be remedied by bootstrapping (observations, not residuals) those procedures. As usual, blocks may be more appropriate since we have serial dependence.
And we’re not even talking about inference just yet.
PS: There is some discussion of point 2 and the local projection approach in this recent paper.