# xgbse._debiased_bce.XGBSEDebiasedBCE¶

Train a set of logistic regressions on top of the leaf embedding produced by XGBoost, each predicting survival at different user-defined discrete time windows. The classifiers remove individuals as they are censored, with targets that are indicators of surviving at each window.

Note

- Training and scoring of logistic regression models is efficient, being performed in parallel through joblib, so the model can scale to hundreds of thousands or millions of samples.
- However, if many windows are used and data is large, training of logistic regression models may become a bottleneck, taking more time than training of the underlying XGBoost model.

Read more in How XGBSE works.

###
`__init__(self, xgb_params=None, lr_params=None, n_jobs=-1)`

`special`

¶

**Parameters:**

Name | Type | Description | Default |
---|---|---|---|

`xgb_params` |
`Dict, None` |
Parameters for XGBoost model. If not passed, the following default parameters will be used:
Check https://xgboost.readthedocs.io/en/latest/parameter.html for more options. |
`None` |

`lr_params` |
`Dict, None` |
Parameters for Logistic Regression models. If not passed, the following default parameters will be used:
Check https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html for more options. |
`None` |

`n_jobs` |
`Int` |
Number of CPU cores used to fit logistic regressions via joblib. |
`-1` |

###
`fit(self, X, y, num_boost_round=1000, validation_data=None, early_stopping_rounds=None, verbose_eval=0, persist_train=False, index_id=None, time_bins=None)`

¶

Transform feature space by fitting a XGBoost model and returning its leaf indices. Leaves are transformed and considered as dummy variables to fit multiple logistic regression models to each evaluated time bin.

**Parameters:**

Name | Type | Description | Default |
---|---|---|---|

`X` |
`[pd.DataFrame, np.array]` |
Features to be used while fitting XGBoost model |
required |

`y` |
`structured array(numpy.bool_, numpy.number` |
Binary event indicator as first field, and time of event or time of censoring as second field. |
required |

`num_boost_round` |
`Int` |
Number of boosting iterations. |
`1000` |

`validation_data` |
`Tuple` |
Validation data in the format of a list of tuples [(X, y)] if user desires to use early stopping |
`None` |

`early_stopping_rounds` |
`Int` |
Activates early stopping.
Validation metric needs to improve at least once
in every |
`None` |

`verbose_eval` |
`[Bool, Int]` |
Level of verbosity. See xgboost.train documentation. |
`0` |

`persist_train` |
`Bool` |
Whether or not to persist training data to use explainability through prototypes |
`False` |

`index_id` |
`pd.Index` |
User defined index if intended to use explainability through prototypes |
`None` |

`time_bins` |
`np.array` |
Specified time windows to use when making survival predictions |
`None` |

**Returns:**

Type | Description |
---|---|

`XGBSEDebiasedBCE` |
Trained XGBSEDebiasedBCE instance |

###
`predict(self, X, return_interval_probs=False)`

¶

Predicts survival probabilities using the XGBoost + Logistic Regression pipeline.

**Parameters:**

Name | Type | Description | Default |
---|---|---|---|

`X` |
`pd.DataFrame` |
Dataframe of features to be used as input for the XGBoost model. |
required |

`return_interval_probs` |
`Bool` |
Boolean indicating if interval probabilities are supposed to be returned. If False the cumulative survival is returned. Default is False. |
`False` |

**Returns:**

Type | Description |
---|---|

`pd.DataFrame` |
A dataframe of survival probabilities for all times (columns), from a time_bins array, for all samples of X (rows). If return_interval_probs is True, the interval probabilities are returned instead of the cumulative survival probabilities. |