Skip to content

xgbse.metrics

approx_brier_score(y_true, survival, aggregate='mean')

Estimate brier score for all survival time windows. Aggregate scores for an approximate integrated brier score estimate.

Parameters:

Name Type Description Default
y_true structured array(numpy.bool_, numpy.number

B inary event indicator as first field, and time of event or time of censoring as second field.

required
survival [pd.DataFrame, np.array]

A dataframe of survival probabilities for all times (columns), from a time_bins array, for all samples of X (rows). If risk_strategy is 'precomputed', is an array with representing risks for each sample.

required
aggregate [string, None]

How to aggregate brier scores from different time windows:

  • mean takes simple average

  • None returns full list of brier scores for each time window

'mean'

Returns:

Type Description
[Float, np.array]

single value if aggregate is 'mean' np.array if aggregate is None

concordance_index(y_true, survival, risk_strategy='mean', which_window=None)

Compute the C-index for a structured array of ground truth times and events and a predicted survival curve using different strategies for estimating risk from it.

Note

  • Computation of the C-index is \(\mathcal{O}(n^2)\).

Parameters:

Name Type Description Default
y_true structured array(numpy.bool_, numpy.number

Binary event indicator as first field, and time of event or time of censoring as second field.

required
survival [pd.DataFrame, np.array]

A dataframe of survival probabilities for all times (columns), from a time_bins array, for all samples of X (rows). If risk_strategy is 'precomputed', is an array with representing risks for each sample.

required
risk_strategy string

Strategy to compute risks from the survival curve. For a given sample:

  • mean averages probabilities across all times

  • window: lets user choose on of the time windows available (by which_window argument) and uses probabilities of this specific window

  • midpoint: selects the most central window of index int(survival.columns.shape[0]/2) and uses probabilities of this specific window

  • precomputed: assumes user has already calculated risk. The survival argument is assumed to contain an array of risks instead

'mean'
which_window object

Which window to use when risk_strategy is 'window'. Should be one of the columns of the dataframe. Will raise ValueError if column is not present

None

Returns:

Type Description
Float

Concordance index for y_true and survival

dist_calibration_score(y_true, survival, n_bins=10, returns='pval')

Estimate D-Calibration for the survival predictions.

Parameters:

Name Type Description Default
y_true structured array(numpy.bool_, numpy.number

Binary event indicator as first field, and time of event or time of censoring as second field.

required
survival [pd.DataFrame, np.array]

A dataframe of survival probabilities for all times (columns), from a time_bins array, for all samples of X (rows). If risk_strategy is 'precomputed', is an array with representing risks for each sample.

required
n_bins Int

Number of bins to equally divide the [0, 1] interval

10
returns string

What information to return from the function:

  • statistic returns the chi squared test statistic

  • pval returns the chi squared test p value

  • max_deviation returns the maximum percentage deviation from the expected value, calculated as abs(expected_percentage - real_percentage), where expected_percentage = 1.0/n_bins

  • histogram returns the full calibration histogram per bin

  • all returns all of the above in a dictionary

'pval'

Returns:

Type Description
[Float, np.array, Dict]
  • Single value if returns is in `['statistic','pval','max_deviation']``
  • np.array if returns is 'histogram'
  • dict if returns is 'all'