Skip to content

xgbse._kaplan_neighbors.XGBSEKaplanTree

Single tree implementation as a simplification to XGBSEKaplanNeighbors. Instead of doing nearest neighbor searches, fits a single tree via xgboost and calculates KM curves at each of its leaves.

Note

  • It is by far the most efficient implementation, able to scale to millions of examples easily. At fit time, the tree is built and all KM curves are pre-calculated, so that at scoring time a simple query will suffice to get the model's estimates.

Read more in How XGBSE works.

__init__(self, xgb_params=None) special

Parameters:

Name Type Description Default
xgb_params Dict

Parameters for XGBoost model. If not passed, the following default parameters will be used:

DEFAULT_PARAMS_TREE = {
    "objective": "survival:cox",
    "eval_metric": "cox-nloglik",
    "tree_method": "exact",
    "max_depth": 100,
    "booster": "dart",
    "subsample": 1.0,
    "min_child_weight": 30,
    "colsample_bynode": 1.0,
}

Check https://xgboost.readthedocs.io/en/latest/parameter.html for more options.

None

fit(self, X, y, persist_train=True, index_id=None, time_bins=None, ci_width=0.683, **xgb_kwargs)

Fit a single decision tree using xgboost. For each leaf in the tree, build a Kaplan-Meier estimator.

Note

  • Differently from XGBSEKaplanNeighbors, in XGBSEKaplanTree, the width of the confidence interval (ci_width) must be specified at fit time.

Parameters:

Name Type Description Default
X [pd.DataFrame, np.array]

Design matrix to fit XGBoost model

required
y structured array(numpy.bool_, numpy.number

Binary event indicator as first field, and time of event or time of censoring as second field.

required
persist_train Bool

Whether or not to persist training data to use explainability through prototypes

True
index_id pd.Index

User defined index if intended to use explainability through prototypes

None
time_bins np.array

Specified time windows to use when making survival predictions

None
ci_width Float

Width of confidence interval

0.683

Returns:

Type Description
XGBSEKaplanTree

Trained instance of XGBSEKaplanTree

predict(self, X, return_ci=False, return_interval_probs=False)

Run samples through tree until terminal nodes. Predict the Kaplan-Meier estimator associated to the leaf node each sample ended into.

Parameters:

Name Type Description Default
X pd.DataFrame

Data frame with samples to generate predictions

required
return_ci Bool

Whether to return confidence intervals via the Exponential Greenwood formula

False
return_interval_probs Bool

Boolean indicating if interval probabilities are supposed to be returned. If False the cumulative survival is returned.

False

Returns:

Type Description
preds_df (pd.DataFrame)

A dataframe of survival probabilities for all times (columns), from a time_bins array, for all samples of X (rows). If return_interval_probs is True, the interval probabilities are returned instead of the cumulative survival probabilities.

upper_ci (np.array): Upper confidence interval for the survival probability values

lower_ci (np.array): Lower confidence interval for the survival probability values