Working Papers

To Bag is to Prune 

Random Forest is a key algorithm within the modern ML canon. And it is perhaps the only one which completely overfits the training sample with no consequence out-of-sample. To resolve the apparent paradox, I argue that Random Forest is miraculously self-regularized on a hold-out sample, and show how it is a feature of randomized greedy algorithms. Naturally, I adjust other ML algorithms so they inherit the desirable property.

[Arxiv] [Slides]

Slow-Growing Trees

Random Forest’s performance can be matched by a single tree if the latter is grown with a small learning rate.

[Arxiv]

From Reactive to Proactive Volatility Modeling with Hemisphere Neural Networks

Hemisphere Neural Network (HNN) provides proactive volatility forecasts based on leading indicators when it can, and reactive volatility based on the magnitude of previous prediction errors when it must.

with Mikael Frenette, and Karin Klieber.

[ SSRN ]

A Neural Phillips Curve and a Deep Output Gap

The level of economic slack can be estimated through a special form of deep neural network. As it turns out, the output gap as of 2022 is likely wide open, unlike what one would obtain from most standard filtering methods. Accordingly, the non-transitory inflation of 2021 appears less surprising through the lenses of an interpretable deep learning model estimated using data through 2019Q4.

(R&R, J of Business and Economic Statistics)

[SSRN] [Slides]

[Slides TimeWorld 2022] (for a general non-econ audience)

[Slides SUERF/OeNB 2022] (lightning talk)

The Anatomy of Out-of-Sample Forecasting Accuracy

“We introduce Performance-based Shapley Values, telling us exactly how each individual predictor increased or decreased the final RMSE, thereby anatomizing out-of-sample forecasting accuracy.”

with Daniel Borup, Dave Rapach, Erik Christian Montes Schütte, and Sander Schwenk-Nebbe

[ SSRN ] [ Blog (by Sander) ] [ Python ]

 Time-Varying Parameters as Ridge Regressions 

Yes, one can trade the filtering machinery for a penalized regression which is only second to OLS in terms of simplicity. This has several implications — all explored in the paper. I consider an application to large local projections in the context of Canadian monetary policy.

[Arxiv]

The Anatomy of Machine Learning-Based Portfolio Performance

We introduce the Shapley-based Portfolio Performance Contribution (SPPC). This tool directly estimates how individual predictor (or groups of them) enhance the performance of a portfolio built by modern predictive models.

with Dave Rapach, Erik Christian Montes Schütte, and Sander Schwenk-Nebbe

[ SSRN ]

Maximally Machine-Learnable Portfolios

We develop a collaborative machine learning algorithm that optimizes portfolio weights so that the resulting synthetic security is maximally predictable. Precisely, we introduce MACE, a multivariate extension of Alternating Conditional Expectations that achieves this goal by wielding a Random Forest (RF) on one side of the equation, and a constrained Ridge Regression on the other.

with Maximilian Göbel (Bocconi)

[ SSRN ]

Publications/Accepted

The Macroeconomy as a Random Forest

Everybody likes small linear macroeconomic equations. However, they are often unstable through time. How? We don’t know. This can jeopardize their ability to predict and accurately depict the economy. I propose to cast many form of time-variations proposed over years within a Random Forest and let the data really decide.

[SSRN] [Slides] [Sofie Seminar Video] [General Audience Penn Talk] (Journal of Applied Econometrics)

Assessing and Comparing Fixed-Target Forecasts of Arctic Sea Ice: Glide Charts for Feature-Engineered Linear Regression and Machine Learning Models

We model jointly key indicators of arctic sea ice and suggest a way to constrain multivariate forecasts such that they hit zero simultaneously. We discuss the benefits of such regularization for long-run sea ice projections.

with Frank Diebold (Penn) and Maximilian Göbel (Bocconi)

[Arxiv] [Web App for September 2022 forecasts ] (Energy Economics)

Arctic Amplification of Anthropogenic Forcing: A Vector Autoregressive Analysis 

The Arctic system is characterized by feedback loops likely amplifying the effect of CO2 on melting sea ice extent. This may explain why summer sea ice is vanishing much faster than previously thought. We show how the VARCTIC can help in sorting things out, and thus constitute a complementary tool to climate models.

with Maximilian Göbel (Bocconi)

[Journal of Climate] [Arxiv] (Journal of Climate)

When Will Arctic Sea Ice Disappear? Projections of Area, Extent, Thickness, and Volume

We model jointly key indicators of arctic sea ice and suggest a way to constrain multivariate forecasts such that they hit zero simultaneously. We discuss the benefits of such regularization for long-run sea ice projections.

with Frank Diebold (Penn), Maximilian Göbel (Bocconi), Glenn Rudebusch (Brookings) and Boyuan Zhang (Penn).

[Arxiv] (Journal of Econometrics)

Can Machine Learning Catch the COVID-19 Recession?

Will ML forecasts go crazy if economic data goes crazy — or will it be overly conservative?

with Massimiliano Marcelino and Dalibor Stevanovic.

[Arxiv] (National Institute Economic Review)

Optimal Combination of Arctic Sea Ice Extent Measures: A Dynamic Factor Modeling Approach

Arctic sea ice extent measures — all obtained from a combination of satellite imagery and algorithmic post-processing — contain undesirable noise. We propose to extract the “true” sea ice extent with methods inspired from time series econometrics.

with Frank Diebold (Penn), Maximilian Göbel (Bocconi), Glenn Rudebusch (FRB-SF) and Boyuan Zhang (Penn).

[Arxiv] (International Journal of Forecasting, Special Issue)

Macroeconomic Data Transformations Matter

Rotating the feature matrix entering a ML algorithm alters its explicit or implicit regularization. Some regularization schemes are better suited for macroeconomic time series than others. We examine the benefits of classic data transformations and propose new ones.

with Maxime Leroux, Dalibor Stevanovic and Stéphane Surprenant (all UQÀM)

[Arxiv] [Slides (by S. Surprenant)] [Poster (by M. Leroux)] (International Journal of Forecasting, Special Issue)

How is Machine Learning Useful for Macroeconomic Forecasting?

We map ML algorithms in their features space and evaluate the treatment effects of those features on predictive accuracy. This answer to the above question is: nonparametric nonlinearity.

with Maxime Leroux, Dalibor Stevanovic and Stéphane Surprenant (all UQÀM)

[Arxiv] (Journal of Applied Econometrics)

On Spurious Causality, CO2, and Global Temperature

We show that the increasingly popular information flows framework extracts spurious causality for most data generating processes. We propose an alternative route and revisit evidence on the linkage between CO2 and global temperature.

with Maximilian Göbel (U of Lisbon)

[Arxiv] (Econometrics, Special Issue on “Econometric Analysis of Climate Change”)

 

Policy Documents

Prévision de l’activité économique au Québec et au Canada à l’aide des méthodes Machine Learning

We apply traditional ML methods to forecast key economic indicators for Québec and Canada. Nonlinearities are shown to be quite helpful. Furthermore, the algorithms developed in Macro RF paper above provide sizable predictive accuracy gains for many targets, corroborating Macroeconomic Random Forest's applicability and success beyond US data.

with Maxime Leroux, Dalibor Stevanovic and Stéphane Surprenant (all UQÀM)

[CIRANO WP]