Docs
Four calls: fit, transform, score, monitor.
1. fit()
Provide a held-out set of raw scores and their true outcomes. Calibrate fits isotonic, Platt/sigmoid, and beta maps and selects the one that minimizes log-loss on your data. Returns a versioned calibrator.
2. transform()
Pass any future raw score (or a batch) and get back a calibrated probability. Drop it in front of your decision logic — sizing, thresholding, bidding — so the math downstream is finally honest.
3. score()
Get Brier, log-loss, and expected calibration error before and after, plus the reliability curve. These are the numbers that go on a model card or in front of a reviewer.
4. drift_check()
Stream live batches; Calibrate flags when the fitted map no longer holds and recommends a refit. Wire the alert to Slack or a webhook so calibration stays current automatically.
When you need calibration
| Use case | Why a raw score breaks it |
|---|---|
| Risk / fraud thresholds | A threshold on an over-confident score blocks the wrong volume of transactions. |
| Bidding / pricing | Expected value = probability × payoff. A wrong probability misprices every bid. |
| Churn / lead scoring | You intervene on "0.9 likely" — if 0.9 really means 0.7, you spend on the wrong customers. |
| Forecasting & triage | Decisions ranked by confidence are mis-ranked when confidence isn't calibrated. |
Secret-sauce boundary: Calibrate ships the calibration engine, scoring, diagrams, and drift detection. It learns only from the labeled data you supply, and it does not contain — and will never share — our own thresholds, weights, or domain models. This is a methodology API, not a data product.