Econometricians Should Know About Double Descent

Double descent (DD) is ubiquitous in modern deep learning and I just found a great set of slides for the world of econometrics to know about it.

Basically, DD is the astonishing observation that past the point where you have enough parameters to perfectly fit your training data, adding even more parameters can in fact improve out-of-sample performance. To re-emphasize, we are fitting the data perfectly, and there is explicit regularization involved — like early stopping. Obviously, at first sight, this looks like a slap in the face of the bias-variance trade-off. But there is more to it and the aforementioned slides go through (in a digestible way) two key papers on the matter.

Here are some applied econometrics implications, in order of strangeness. First, in a paper of mine, I fit a 10 layers and 100 neurons fully-connected NN to quarterly inflation data and get the best out-of-sample performance. And yet, the ratio of parameters to observations is literately ridiculous.

Second, economists are typically interested in understanding the model. This is typically a partial derivative (of varying complexity) with respect to some key variable. A ML way to proceed is to tune the model to perform well on some test set and analyse the optimally tuned predictive function. But with DD, the best predictive function on the test set is one that we know for sure to be overfitting the training set, making it unusable for more “structural” analyses. Adopting an out-of-bag approach (like in Random Forest) can deal with this to an appreciable extent, but computations can be daunting.

Anyhow, these are two out of many strange questions to come for econometricians venturing in the realm of deep learning.

Previous
Previous

Careful With That Axe: ML, Recursive Forecasting and IRFs

Next
Next

Nietzsche on Tuning Deep Neural Nets