Spearman Neural Networks: A Bridge Between Trees and Deep Learning
In machine learning for tabular data, it is well known that tree-based models—especially random forests and gradient boosting—tend to outperform neural networks. This has been reported in many benchmark studies and real-world applications.
The advantage of tree-based models lies not only in performance but also in usability. Random forests, for example, require very little hyperparameter tuning and are remarkably robust out-of-the-box. Boosting may demand more care, but even then, tuning remains relatively manageable. Compare this to neural networks, which often need substantial computational resources and careful calibration to deliver competitive results on structured data.
Much has been written about why tree-based models do so well. I’ve previously argued that random forests are self-tuning, in the sense that the ensemble mechanism acts as an implicit form of pruning. Others have pointed out how the greedy construction of decision trees makes them naturally adaptive to sharp discontinuities as well as smooth functional relationships—a flexibility that we explored in our paper on adaptive moving averages in macroeconomic time series.
But here’s a perspective I haven’t seen explored much: Tree-based models rely entirely on rank information.
Think about it. When a decision tree splits data based on a feature, the precise value of the feature is irrelevant—what matters is the ordering of observations. You could reformulate the entire tree-building process using quantiles or ranks: instead of splitting at “x = 42.3,” you could just as well split at the 75th percentile of x, and obtain a functionally equivalent partition. In essence, tree models work in the world of ordinal information—much like Spearman correlation, not Pearson.
This observation leads to an intriguing idea:
What if we transformed input variables in a neural network to use only their ranks, rather than their raw values?
In practical terms, this means taking each feature and replacing it with the rank (from 1 to N) of that value in the dataset. This transformation would enforce the same kind of “value-agnostic” structure that tree models exploit. A rank-transformed neural network might then behave more like a tree ensemble in how it generalizes and reacts to outliers or boundary conditions.
How could this help?
It could reduce sensitivity to scale and outliers, a known strength of tree-based methods.
It would impose a form of robustness, which might help stabilize neural net training on messy datasets.
It offers a low-cost experiment to bring some of the benefits of trees into neural architectures—especially useful when computing tree ensembles is not feasible.
This “rank-based neural network” wouldn’t fully replicate tree models, of course. But it may bridge part of the gap in tabular data applications.