---
title: "Prediction Calibration and Conformal Inference"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Prediction Calibration and Conformal Inference}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 6,
  fig.height = 4
)
```

## Why calibration matters

A prediction interval that claims 95% coverage but actually covers
the true value only 70% of the time is worse than useless --- it
gives decision-makers false confidence. In genomic surveillance,
this can mean underestimating the probability that a variant reaches
a critical threshold, leading to delayed public health response.

Calibration asks a simple question: when a model says "there is a
95% chance the true frequency lies in this interval," does the
observation actually fall within that interval 95% of the time?
Surprisingly few surveillance forecasting tools assess this
systematically.

`lineagefreq` provides three capabilities for forecast calibration
that, to our knowledge, no other genomic surveillance package offers:
calibration diagnostics (`calibrate()`), post-hoc recalibration
(`recalibrate()`), and distribution-free conformal prediction
intervals (`conformal_forecast()`).

## PIT diagnostics on real CDC data

The Probability Integral Transform (PIT) provides a comprehensive
calibration diagnostic. For a perfectly calibrated forecaster, PIT
values are uniformly distributed on [0, 1]. Departures from
uniformity reveal specific miscalibration patterns: a U-shaped PIT
histogram indicates underdispersion (intervals too narrow), while a
hump shape indicates overdispersion (intervals too wide).

```{r pit-example, eval = FALSE}
library(lineagefreq)
data(cdc_ba2_transition)

x <- lfq_data(cdc_ba2_transition, lineage = lineage,
              date = date, count = count)
bt <- backtest(x, engines = "mlr",
               horizons = c(7, 14, 21, 28), min_train = 42)

cal <- calibrate(bt)
cal

# PIT histogram
plot(cal, type = "pit")

# Reliability diagram
plot(cal, type = "reliability")
```

The reliability diagram plots observed coverage against nominal
coverage at levels 10% through 90%. Perfect calibration lies on
the diagonal. Points above the diagonal indicate overconfidence
(the model claims narrower intervals than warranted); points below
indicate conservatism.

## Recalibration

When the PIT diagnostic reveals miscalibration, `recalibrate()`
adjusts prediction intervals to improve coverage. Two methods are
available:

**Isotonic regression** learns a nonparametric, monotone mapping
from nominal to observed coverage using the backtest data and
inverts it. This requires no distributional assumptions.

**Platt scaling** fits a logistic regression to the relationship
between standardised residuals and coverage, providing a smooth
parametric adjustment.

```{r recalibrate-example, eval = FALSE}
fit <- fit_model(x, engine = "mlr")
fc  <- forecast(fit, horizon = 28)

# Isotonic recalibration
fc_recal <- recalibrate(fc, bt, method = "isotonic")
autoplot(fc_recal)
```

## Conformal prediction intervals

Parametric prediction intervals from `forecast()` assume that the
multinomial logistic model is correctly specified and that the
parameter estimation uncertainty is well captured by the Fisher
information matrix. When these assumptions are violated --- as
they may be during rapid variant replacement waves --- the
intervals can be substantially miscalibrated.

Conformal prediction offers a distribution-free alternative.
Split conformal inference (Vovk et al. 2005) provides prediction
intervals with exact finite-sample coverage guarantees under the
assumption of exchangeability:

```{r conformal-example, eval = FALSE}
fc_conf <- conformal_forecast(fit, x, horizon = 28,
                              ci_level = 0.95, seed = 42)
autoplot(fc_conf)
```

The conformal intervals are typically wider than the parametric
intervals when the model is well-specified, but narrower when the
model is misspecified --- precisely the situation where accurate
uncertainty quantification matters most.

### Adaptive conformal inference

For time-series data where the distribution shifts over time (as
during a variant replacement wave), the exchangeability assumption
is violated. Adaptive conformal inference (ACI; Gibbs and Candes,
2021) addresses this by adjusting the miscoverage level online:

```{r aci-example, eval = FALSE}
fc_aci <- conformal_forecast(fit, x, horizon = 28,
                             method = "aci", gamma = 0.05)
```

The `gamma` parameter controls the learning rate: larger values
track distribution shifts more aggressively but increase interval
variability.

## Proper scoring rules

`score_forecasts()` now supports four additional proper scoring
rules alongside the original MAE, RMSE, coverage, and WIS:

- **CRPS** (Continuous Ranked Probability Score): the integral of
  the squared difference between the forecast CDF and the empirical
  CDF of the observation. Simultaneously rewards calibration and
  sharpness.
- **Log score**: the negative log-likelihood of the observation
  under the forecast distribution. Heavily penalises confident
  wrong forecasts.
- **DSS** (Dawid-Sebastiani Score): depends only on the first two
  moments, providing a computationally efficient alternative to the
  log score.
- **Calibration score**: mean squared calibration error across
  nominal levels 10%--90%.

```{r scoring-example, eval = FALSE}
sc <- score_forecasts(bt, metrics = c("mae", "crps", "log_score",
                                      "dss", "calibration"))
compare_models(sc)
```

These scoring rules are proper in the sense of Gneiting and Raftery
(2007): they are minimised in expectation when the forecast
distribution equals the true data-generating distribution. This
means that a forecaster cannot improve their expected score by
issuing dishonest predictions.

## References

- Gneiting T, Balabdaoui F, Raftery AE (2007). Probabilistic
  forecasts, calibration and sharpness. *JRSS-B* 69(2):243--268.
- Gneiting T, Raftery AE (2007). Strictly proper scoring rules,
  prediction, and estimation. *JASA* 102(477):359--378.
- Vovk V, Gammerman A, Shafer G (2005). *Algorithmic Learning in
  a Random World*. Springer.
- Gibbs I, Candes E (2021). Adaptive conformal inference under
  distribution shift. *NeurIPS* 34.