On the Convincing Evaluation of Macro Forecasting Models
Here are some after thoughts on a topic that was raised at the ECB Macro Forecasting conference.
It is rapidly obvious to somebody doing macro forecasting research that results are very hard to compare across studies. Not very helpful for the advancement of science. There are many reasons for this. The most obvious one is that unlike most Kaggle competitions (which are extremely useful to distill out great ML offerings from the mass), things must be done recursively to closely mimic what a forecaster is actually faced with. Hence, we cannot simply fix the training and test set (data-wise and variables-wise). Ensues many design choices that can easily make studies incomparable. Then, there is the question of real-time data. And that of the evaluation window.
There is a need for a centralized real-time forecast gathering operation. Not a bunch of forecasts scattered around Twitter.
Yet things need not be complicated. Here is an example.
Every year, the Sea Ice Outlook (SIO) is carried with contributions from many research institutes. Forecasts for September Arctic sea ice must be submitted every year at specific dates (there are 4 of them), so no "cheating", and the research design is fixed to be… reality. And it works with a Google submission form. Of course, inflation is not ice, but an extension of the concept is easily imaginable. In the end, this could very well look like a Survey of Model-Based Forecasters.
The SIO final product is a post-season report which is always very informative. Check out at Figure 7. Wouldn’t it be nice to have this for DSGEs, ML methods, BVARs, or MS-TVP-SV-MF-FAVARMAs?