Docs

Four calls: fit, transform, score, monitor.

1. fit()

Provide a held-out set of raw scores and their true outcomes. Calibrate fits isotonic, Platt/sigmoid, and beta maps and selects the one that minimizes log-loss on your data. Returns a versioned calibrator.

2. transform()

Pass any future raw score (or a batch) and get back a calibrated probability. Drop it in front of your decision logic — sizing, thresholding, bidding — so the math downstream is finally honest.

3. score()

Get Brier, log-loss, and expected calibration error before and after, plus the reliability curve. These are the numbers that go on a model card or in front of a reviewer.

4. drift_check()

Stream live batches; Calibrate flags when the fitted map no longer holds and recommends a refit. Wire the alert to Slack or a webhook so calibration stays current automatically.

When you need calibration

Use case	Why a raw score breaks it
Risk / fraud thresholds	A threshold on an over-confident score blocks the wrong volume of transactions.
Bidding / pricing	Expected value = probability × payoff. A wrong probability misprices every bid.
Churn / lead scoring	You intervene on "0.9 likely" — if 0.9 really means 0.7, you spend on the wrong customers.
Forecasting & triage	Decisions ranked by confidence are mis-ranked when confidence isn't calibrated.

Secret-sauce boundary: Calibrate ships the calibration engine, scoring, diagrams, and drift detection. It learns only from the labeled data you supply, and it does not contain — and will never share — our own thresholds, weights, or domain models. This is a methodology API, not a data product.

See pricing Get access