xgbse._kaplan_neighbors.XGBSEKaplanTree¶
Single tree implementation as a simplification to XGBSEKaplanNeighbors
.
Instead of doing nearest neighbor searches, fits a single tree via xgboost
and calculates KM curves at each of its leaves.
Note
- It is by far the most efficient implementation, able to scale to millions of examples easily. At fit time, the tree is built and all KM curves are pre-calculated, so that at scoring time a simple query will suffice to get the model's estimates.
Read more in How XGBSE works.
__init__(self, xgb_params=None)
special
¶
Parameters:
Name | Type | Description | Default |
---|---|---|---|
xgb_params |
Dict |
Parameters for XGBoost model. If not passed, the following default parameters will be used:
Check https://xgboost.readthedocs.io/en/latest/parameter.html for more options. |
None |
fit(self, X, y, persist_train=True, index_id=None, time_bins=None, ci_width=0.683, **xgb_kwargs)
¶
Fit a single decision tree using xgboost. For each leaf in the tree, build a Kaplan-Meier estimator.
Note
- Differently from
XGBSEKaplanNeighbors
, inXGBSEKaplanTree
, the width of the confidence interval (ci_width
) must be specified at fit time.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
[pd.DataFrame, np.array] |
Design matrix to fit XGBoost model |
required |
y |
structured array(numpy.bool_, numpy.number |
Binary event indicator as first field, and time of event or time of censoring as second field. |
required |
persist_train |
Bool |
Whether or not to persist training data to use explainability through prototypes |
True |
index_id |
pd.Index |
User defined index if intended to use explainability through prototypes |
None |
time_bins |
np.array |
Specified time windows to use when making survival predictions |
None |
ci_width |
Float |
Width of confidence interval |
0.683 |
Returns:
Type | Description |
---|---|
XGBSEKaplanTree |
Trained instance of XGBSEKaplanTree |
predict(self, X, return_ci=False, return_interval_probs=False)
¶
Run samples through tree until terminal nodes. Predict the Kaplan-Meier estimator associated to the leaf node each sample ended into.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
pd.DataFrame |
Data frame with samples to generate predictions |
required |
return_ci |
Bool |
Whether to return confidence intervals via the Exponential Greenwood formula |
False |
return_interval_probs |
Bool |
Boolean indicating if interval probabilities are supposed to be returned. If False the cumulative survival is returned. |
False |
Returns:
Type | Description |
---|---|
preds_df (pd.DataFrame) |
A dataframe of survival probabilities for all times (columns), from a time_bins array, for all samples of X (rows). If return_interval_probs is True, the interval probabilities are returned instead of the cumulative survival probabilities. upper_ci (np.array): Upper confidence interval for the survival probability values lower_ci (np.array): Lower confidence interval for the survival probability values |