# xgbse.metrics¶

### approx_brier_score(y_true, survival, aggregate='mean')¶

Estimate brier score for all survival time windows. Aggregate scores for an approximate integrated brier score estimate.

Parameters:

Name Type Description Default
y_true structured array(numpy.bool_, numpy.number

B inary event indicator as first field, and time of event or time of censoring as second field.

required
survival [pd.DataFrame, np.array]

A dataframe of survival probabilities for all times (columns), from a time_bins array, for all samples of X (rows). If risk_strategy is 'precomputed', is an array with representing risks for each sample.

required
aggregate [string, None]

How to aggregate brier scores from different time windows:

• mean takes simple average

• None returns full list of brier scores for each time window

'mean'

Returns:

Type Description
[Float, np.array]

single value if aggregate is 'mean' np.array if aggregate is None

### concordance_index(y_true, survival, risk_strategy='mean', which_window=None)¶

Compute the C-index for a structured array of ground truth times and events and a predicted survival curve using different strategies for estimating risk from it.

Note

• Computation of the C-index is $$\mathcal{O}(n^2)$$.

Parameters:

Name Type Description Default
y_true structured array(numpy.bool_, numpy.number

Binary event indicator as first field, and time of event or time of censoring as second field.

required
survival [pd.DataFrame, np.array]

A dataframe of survival probabilities for all times (columns), from a time_bins array, for all samples of X (rows). If risk_strategy is 'precomputed', is an array with representing risks for each sample.

required
risk_strategy string

Strategy to compute risks from the survival curve. For a given sample:

• mean averages probabilities across all times

• window: lets user choose on of the time windows available (by which_window argument) and uses probabilities of this specific window

• midpoint: selects the most central window of index int(survival.columns.shape[0]/2) and uses probabilities of this specific window

• precomputed: assumes user has already calculated risk. The survival argument is assumed to contain an array of risks instead

'mean'
which_window object

Which window to use when risk_strategy is 'window'. Should be one of the columns of the dataframe. Will raise ValueError if column is not present

None

Returns:

Type Description
Float

Concordance index for y_true and survival

### dist_calibration_score(y_true, survival, n_bins=10, returns='pval')¶

Estimate D-Calibration for the survival predictions.

Parameters:

Name Type Description Default
y_true structured array(numpy.bool_, numpy.number

Binary event indicator as first field, and time of event or time of censoring as second field.

required
survival [pd.DataFrame, np.array]

A dataframe of survival probabilities for all times (columns), from a time_bins array, for all samples of X (rows). If risk_strategy is 'precomputed', is an array with representing risks for each sample.

required
n_bins Int

Number of bins to equally divide the [0, 1] interval

10
returns string

What information to return from the function:

• statistic returns the chi squared test statistic

• pval returns the chi squared test p value

• max_deviation returns the maximum percentage deviation from the expected value, calculated as abs(expected_percentage - real_percentage), where expected_percentage = 1.0/n_bins

• histogram returns the full calibration histogram per bin

• all returns all of the above in a dictionary

'pval'

Returns:

Type Description
[Float, np.array, Dict]
• Single value if returns is in ['statistic','pval','max_deviation']`
• np.array if returns is 'histogram'
• dict if returns is 'all'