| Title: | Algorithmic Fairness Assessment for Clinical Prediction Models |
|---|---|
| Description: | Post-hoc fairness auditing toolkit for clinical prediction models. Unlike in-processing approaches that modify model training, this package evaluates existing models by computing group-wise fairness metrics (demographic parity, equalized odds, predictive parity, calibration disparity), visualizing disparities across protected attributes, and performing threshold-based mitigation. Supports intersectional analysis across multiple attributes and generates audit reports useful for fairness-oriented auditing in clinical AI settings. Methods described in Obermeyer et al. (2019) <doi:10.1126/science.aax2342> and Hardt, Price, and Srebro (2016) <doi:10.48550/arXiv.1610.02413>. |
| Authors: | Cuiwei Gao [aut, cre, cph] |
| Maintainer: | Cuiwei Gao <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.1 |
| Built: | 2026-05-19 10:01:26 UTC |
| Source: | https://github.com/cuiweig/clinicalfair |
Plot fairness metrics disparity
## S3 method for class 'fairness_metrics' autoplot(object, type = c("disparity", "roc", "calibration"), ...)## S3 method for class 'fairness_metrics' autoplot(object, type = c("disparity", "roc", "calibration"), ...)
object |
A |
type |
Plot type: |
... |
Additional arguments (unused). |
A ggplot object.
set.seed(42) fd <- fairness_data( predictions = c(runif(100, 0.2, 0.8), runif(100, 0.3, 0.9)), labels = c(rbinom(100, 1, 0.3), rbinom(100, 1, 0.5)), protected_attr = rep(c("A", "B"), each = 100) ) fm <- fairness_metrics(fd) autoplot(fm)set.seed(42) fd <- fairness_data( predictions = c(runif(100, 0.2, 0.8), runif(100, 0.3, 0.9)), labels = c(rbinom(100, 1, 0.3), rbinom(100, 1, 0.5)), protected_attr = rep(c("A", "B"), each = 100) ) fm <- fairness_metrics(fd) autoplot(fm)
A simulated dataset reflecting the documented racial disparities in recidivism prediction algorithms, based on published statistics from the ProPublica investigation (Angwin et al. 2016).
compas_simcompas_sim
A data frame with 1000 rows and 3 columns:
Predicted recidivism risk (numeric, 0–1).
Actual recidivism outcome (binary, 0/1).
Racial group: White or Black (character).
Simulated. Based on patterns from Angwin et al. (2016) "Machine Bias" and Obermeyer et al. (2019) doi:10.1126/science.aax2342.
data(compas_sim) fd <- fairness_data(compas_sim$risk_score, compas_sim$recidivism, compas_sim$race) fairness_metrics(fd)data(compas_sim) fd <- fairness_data(compas_sim$risk_score, compas_sim$recidivism, compas_sim$race) fairness_metrics(fd)
Bundles predictions, labels, and protected attributes into a standardized container for fairness analysis.
fairness_data( predictions, labels, protected_attr, threshold = 0.5, reference_group = NULL )fairness_data( predictions, labels, protected_attr, threshold = 0.5, reference_group = NULL )
predictions |
Numeric vector of predicted probabilities or risk scores (between 0 and 1). |
labels |
Binary integer vector of true outcomes (0 or 1). |
protected_attr |
Character or factor vector identifying the protected group membership (e.g., race, sex, age group). |
threshold |
Decision threshold for converting probabilities to binary predictions. Default 0.5. |
reference_group |
Name of the reference (privileged) group.
If |
A fairness_data object (list) with standardized components:
predictions, labels, protected, threshold, predicted_class,
reference_group, groups, n, prevalence.
set.seed(42) fd <- fairness_data( predictions = runif(200), labels = rbinom(200, 1, 0.3), protected_attr = sample(c("GroupA", "GroupB"), 200, replace = TRUE) ) fdset.seed(42) fd <- fairness_data( predictions = runif(200), labels = rbinom(200, 1, 0.3), protected_attr = sample(c("GroupA", "GroupB"), 200, replace = TRUE) ) fd
Calculates a comprehensive set of group-wise and comparative
fairness metrics from a fairness_data object, with optional
bootstrap confidence intervals.
fairness_metrics( data, metrics = c("selection_rate", "tpr", "fpr", "ppv", "accuracy", "auc", "brier"), ci = FALSE, n_boot = 2000L, ci_level = 0.95 )fairness_metrics( data, metrics = c("selection_rate", "tpr", "fpr", "ppv", "accuracy", "auc", "brier"), ci = FALSE, n_boot = 2000L, ci_level = 0.95 )
data |
A fairness_data object. |
metrics |
Character vector of metrics to compute. Default
computes all available metrics. Options: |
ci |
Logical; if |
n_boot |
Number of bootstrap replicates when |
ci_level |
Confidence level for the interval. Default 0.95. |
Fairness is assessed by comparing metric values across groups.
A ratio of 1.0 or difference of 0.0 indicates perfect parity.
Common thresholds: ratio in (four-fifths rule,
EEOC guidelines) or difference < 0.05.
When ci = TRUE, percentile bootstrap confidence intervals are
computed by resampling within each group. This accounts for
sampling variability and is recommended when reporting fairness
metrics for regulatory or publication purposes.
A fairness_metrics object (tibble) with columns:
group, metric, value, ratio (vs reference group),
difference (vs reference group). When ci = TRUE, additional
columns ci_lower and ci_upper are included.
Obermeyer Z, et al. (2019). Dissecting racial bias in an algorithm used to manage the health of populations. Science, 366(6464):447–453. doi:10.1126/science.aax2342
set.seed(42) fd <- fairness_data( predictions = c(runif(100, 0.2, 0.8), runif(100, 0.3, 0.9)), labels = c(rbinom(100, 1, 0.3), rbinom(100, 1, 0.5)), protected_attr = rep(c("A", "B"), each = 100) ) fairness_metrics(fd) # With bootstrap CIs fairness_metrics(fd, ci = TRUE, n_boot = 500)set.seed(42) fd <- fairness_data( predictions = c(runif(100, 0.2, 0.8), runif(100, 0.3, 0.9)), labels = c(rbinom(100, 1, 0.3), rbinom(100, 1, 0.5)), protected_attr = rep(c("A", "B"), each = 100) ) fairness_metrics(fd) # With bootstrap CIs fairness_metrics(fd, ci = TRUE, n_boot = 500)
Generate a fairness summary report
fairness_report(data, metrics = NULL)fairness_report(data, metrics = NULL)
data |
A fairness_data object. |
metrics |
A fairness_metrics object. If |
A fairness_report (list) with $summary, $flags,
$recommendation.
set.seed(42) fd <- fairness_data( predictions = c(runif(100, 0.2, 0.8), runif(100, 0.3, 0.9)), labels = c(rbinom(100, 1, 0.3), rbinom(100, 1, 0.5)), protected_attr = rep(c("A", "B"), each = 100) ) fairness_report(fd)set.seed(42) fd <- fairness_data( predictions = c(runif(100, 0.2, 0.8), runif(100, 0.3, 0.9)), labels = c(rbinom(100, 1, 0.3), rbinom(100, 1, 0.5)), protected_attr = rep(c("A", "B"), each = 100) ) fairness_report(fd)
Evaluates fairness across combinations of multiple protected attributes (e.g., race x sex), revealing disparities hidden by single-attribute analysis.
intersectional_fairness( predictions, labels, ..., threshold = 0.5, min_group_size = 10L )intersectional_fairness( predictions, labels, ..., threshold = 0.5, min_group_size = 10L )
predictions |
Numeric vector of predicted probabilities. |
labels |
Binary integer vector of true outcomes. |
... |
Two or more named vectors of protected attributes. Names become the attribute labels. |
threshold |
Decision threshold. Default 0.5. |
min_group_size |
Minimum number of observations required per intersectional group. Groups below this threshold are dropped with a warning. Default 10. |
A fairness_metrics object with intersectional groups.
Groups with fewer than min_group_size observations are excluded.
Buolamwini J, Gebru T (2018). Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification. Conference on Fairness, Accountability and Transparency.
set.seed(42) n <- 400 intersectional_fairness( predictions = runif(n), labels = rbinom(n, 1, 0.3), race = sample(c("White", "Black"), n, replace = TRUE), sex = sample(c("Male", "Female"), n, replace = TRUE) )set.seed(42) n <- 400 intersectional_fairness( predictions = runif(n), labels = rbinom(n, 1, 0.3), race = sample(c("White", "Black"), n, replace = TRUE), sex = sample(c("Male", "Female"), n, replace = TRUE) )
Assesses whether predicted probabilities match observed event rates within each protected group.
plot_calibration(data, n_bins = 10L)plot_calibration(data, n_bins = 10L)
data |
A fairness_data object. |
n_bins |
Number of calibration bins. Default 10. |
A ggplot object.
data(compas_sim) fd <- fairness_data(compas_sim$risk_score, compas_sim$recidivism, compas_sim$race) plot_calibration(fd)data(compas_sim) fd <- fairness_data(compas_sim$risk_score, compas_sim$recidivism, compas_sim$race) plot_calibration(fd)
Plot ROC curves by group
plot_roc(data)plot_roc(data)
data |
A fairness_data object. |
A ggplot object.
set.seed(42) fd <- fairness_data( predictions = c(runif(100, 0.2, 0.8), runif(100, 0.3, 0.9)), labels = c(rbinom(100, 1, 0.3), rbinom(100, 1, 0.5)), protected_attr = rep(c("A", "B"), each = 100) ) plot_roc(fd)set.seed(42) fd <- fairness_data( predictions = c(runif(100, 0.2, 0.8), runif(100, 0.3, 0.9)), labels = c(rbinom(100, 1, 0.3), rbinom(100, 1, 0.5)), protected_attr = rep(c("A", "B"), each = 100) ) plot_roc(fd)
Finds group-specific decision thresholds that maximize accuracy subject to a fairness constraint, or minimize disparity subject to a minimum accuracy constraint.
threshold_optimize( data, objective = c("equalized_odds", "demographic_parity"), min_accuracy = 0.5, grid_resolution = 0.01 )threshold_optimize( data, objective = c("equalized_odds", "demographic_parity"), min_accuracy = 0.5, grid_resolution = 0.01 )
data |
A fairness_data object. |
objective |
|
min_accuracy |
Minimum acceptable overall accuracy. Default 0.5. |
grid_resolution |
Step size for the threshold grid search. Default 0.01 (99 candidate thresholds). Smaller values give finer-grained optimization at modest computational cost. |
This implements post-processing threshold adjustment, the simplest and most transparent mitigation strategy. Each group receives its own threshold to equalize the chosen fairness criterion.
For "equalized_odds", the algorithm computes a pooled target
TPR and FPR across all groups at the original threshold, then
optimizes every group (including the reference) to match the
pooled target. This avoids the asymmetry of fixing the reference
group threshold while only adjusting others.
For clinical applications, group-specific thresholds are interpretable and auditable, unlike in-processing methods that modify the model itself.
A fairness_mitigation object (list) with:
$thresholds (named numeric, one per group),
$before and $after (fairness_metrics objects),
$accuracy_before and $accuracy_after.
Hardt M, Price E, Srebro N (2016). Equality of Opportunity in Supervised Learning. NeurIPS.
data(compas_sim) fd <- fairness_data(compas_sim$risk_score, compas_sim$recidivism, compas_sim$race) mit <- threshold_optimize(fd) mitdata(compas_sim) fd <- fairness_data(compas_sim$risk_score, compas_sim$recidivism, compas_sim$race) mit <- threshold_optimize(fd) mit