xgbse: XGBoost Survival Embeddings¶
"There are two cultures in the use of statistical modeling to reach conclusions from data. One assumes that the data are generated by a given stochastic data model. The other uses algorithmic models and treats the data mechanism as unknown."
- Leo Breiman, Statistical Modeling: The Two Cultures
Survival Analysis is a powerful statistical technique with a wide range of applications such as predictive maintenance, customer churn, credit risk, asset liquidity risk, and others.
However, it has not yet seen widespread adoption in industry, with most implementations embracing one of two cultures:
- models with sound statistical properties, but lacking in expressivess and computational efficiency
- highly efficient and expressive models, but lacking in statistical rigor
xgbse aims to unite the two cultures in a single package, adding a layer of statistical rigor to the highly expressive and computationally effcient
xgboost survival analysis implementation.
The package offers:
- calibrated and unbiased survival curves with confidence intervals (instead of point predictions)
- great predictive power, competitive to vanilla
- efficient, easy to use implementation
- explainability through prototypes
This is a research project by Loft Data Science Team, however we invite the community to contribute. Please help by trying it out, reporting bugs, and letting us know what you think!