FIGS: Fast Interpretable Greedy-Tree Sums
Yan Shuo Tan*, Chandan Singh*, Keyan Nasseri, Abhineet Agarwal, Bin Yu
Modern machine learning has achieved impressive prediction performance, but often sacrifices interpretability, a critical consideration in many problems. Here, we propose Fast Interpretable Greedy-Tree Sums (FIGS), an algorithm for fitting concise rule-based models. Specifically, FIGS generalizes the CART algorithm to work on sums of trees, growing a flexible number of them simultaneously. The total number of splits across all the trees is restricted by a pre-specified threshold, which ensures that FIGS remains interpretable. Extensive experiments show that FIGS achieves state-of-the-art performance across a wide array of real-world datasets when restricted to very few splits (e.g. less than 20). Theoretical and simulation results suggest that FIGS overcomes a key weakness of single-tree models by disentangling additive components of generative additive models, thereby significantly improving convergence rates for $\ell_2$ generalization error. We further characterize the success of FIGS by quantifying how it reduces repeated splits, which can lead to redundancy in single-tree models such as CART. All code and models are released in a full-fledged package available on Github.
An example of interpretable modeling
Here, we examine the Diabetes classification dataset, in which eight risk factors were collected and used to predict the onset of diabetes within 5 five years. Fitting, several models we find that with very few rules, the model can achieve excellent test performance.
For example, Fig 2 shows a model fitted using the FIGS algorithm which achieves a test-AUC of 0.820 despite being extremely simple. In this model, each feature contributes independently of the others, and the final risks from each of three key features is summed to get a risk for the onset of diabetes (higher is higher risk). As opposed to a black-box model, this model is easy to interpret, fast to compute with, and allows us to vet the features being used for decision-making.
Fig 2. Simple model learned by FIGS for diabetes risk prediction.