Skip to content

xgbse._stacked_weibull.XGBSEStackedWeibull

Perform stacking of a XGBoost survival model with a Weibull AFT parametric model. The XGBoost fits the data and then predicts a value that is interpreted as a risk metric. This risk metric is fed to the Weibull regression which uses it as its only independent variable.

Thus, we can get the benefit of XGBoost discrimination power alongside the Weibull AFT statistical rigor (e.g. calibrated survival curves).

Note

  • As we're stacking XGBoost with a single, one-variable parametric model (as opposed to XGBSEDebiasedBCE), the model can be much faster (especially in training).
  • We also have better extrapolation capabilities, as opposed to the cure fraction problem in XGBSEKaplanNeighbors and XGBSEKaplanTree.
  • However, we also have stronger assumptions about the shape of the survival curve.

Read more in How XGBSE works.

__init__(self, xgb_params=None, weibull_params=None) special

Parameters:

Name Type Description Default
xgb_params Dict, None

Parameters for XGBoost model. If not passed, the following default parameters will be used:

DEFAULT_PARAMS = {
    "objective": "survival:aft",
    "eval_metric": "aft-nloglik",
    "aft_loss_distribution": "normal",
    "aft_loss_distribution_scale": 1,
    "tree_method": "hist",
    "learning_rate": 5e-2,
    "max_depth": 8,
    "booster": "dart",
    "subsample": 0.5,
    "min_child_weight": 50,
    "colsample_bynode": 0.5,
}

Check https://xgboost.readthedocs.io/en/latest/parameter.html for more options.

None
weibull_params Dict

Parameters for Weibull Regerssion model. If not passed, will use the default parameters as shown in the Lifelines documentation.

Check https://lifelines.readthedocs.io/en/latest/fitters/regression/WeibullAFTFitter.html for more options.

None

fit(self, X, y, num_boost_round=1000, validation_data=None, early_stopping_rounds=None, verbose_eval=0, persist_train=False, index_id=None, time_bins=None)

Fit XGBoost model to predict a value that is interpreted as a risk metric. Fit Weibull Regression model using risk metric as only independent variable.

Parameters:

Name Type Description Default
X [pd.DataFrame, np.array]

Features to be used while fitting XGBoost model

required
y structured array(numpy.bool_, numpy.number

Binary event indicator as first field, and time of event or time of censoring as second field.

required
num_boost_round Int

Number of boosting iterations.

1000
validation_data Tuple

Validation data in the format of a list of tuples [(X, y)] if user desires to use early stopping

None
early_stopping_rounds Int

Activates early stopping. Validation metric needs to improve at least once in every early_stopping_rounds round(s) to continue training. See xgboost.train documentation.

None
verbose_eval [Bool, Int]

Level of verbosity. See xgboost.train documentation.

0
persist_train Bool

Whether or not to persist training data to use explainability through prototypes

False
index_id pd.Index

User defined index if intended to use explainability through prototypes

None
time_bins np.array

Specified time windows to use when making survival predictions

None

Returns:

Type Description
XGBSEStackedWeibull

Trained XGBSEStackedWeibull instance

predict(self, X, return_interval_probs=False)

Predicts survival probabilities using the XGBoost + Weibull AFT stacking pipeline.

Parameters:

Name Type Description Default
X pd.DataFrame

Dataframe of features to be used as input for the XGBoost model.

required
return_interval_probs Bool

Boolean indicating if interval probabilities are supposed to be returned. If False the cumulative survival is returned. Default is False.

False

Returns:

Type Description
pd.DataFrame

A dataframe of survival probabilities for all times (columns), from a time_bins array, for all samples of X (rows). If return_interval_probs is True, the interval probabilities are returned instead of the cumulative survival probabilities.