Methodology

How we measure and report prediction market accuracy. This page explains our scoring system, time horizons, and data collection process.

What this page contains

Brier Score

The Brier Score measures the accuracy of probabilistic predictions. It's calculated as the squared difference between the predicted probability and the actual outcome.

Formula:

Brier Score = (prediction - outcome)²

Where outcome = 1 if event occurred, 0 if not

Interpretation:

  • 0.00 = Perfect prediction (predicted exactly what happened)
  • 0.25 = Random guessing (50/50 on every prediction)
  • 1.00 = Always completely wrong

Example:

Market: "Will AAPL beat Q4 earnings?"

Prediction 30 days before: 72% Yes

Actual outcome: Yes (AAPL beat earnings)

Brier Score = (0.72 - 1)² = 0.078

This is a good score - the market was confident and correct.

We report Brier scores averaged across all resolved markets, broken down by source, sector, company, and time horizon.

Time Horizons

We measure market predictions at multiple time points before resolution. This shows how accuracy improves as the event approaches.

HorizonDescriptionUse Case
30 days1 month before closeEarly forecast quality
14 days2 weeks before closeMedium-term accuracy
7 days1 week before closeShort-term accuracy
1 day24 hours before closeNear-term precision
12 hours12 hours before closeFinal forecast quality

Our headline Brier score uses the 30-day horizon (or the earliest available if the market existed for less than 30 days). This prevents gaming the metric by only measuring when the answer is nearly known.

Calibration

Calibration measures whether predicted probabilities match observed frequencies. A well-calibrated forecaster should have their 70% predictions come true about 70% of the time.

How we assess calibration:

  1. Group predictions into probability bins (0-10%, 10-20%, etc.)
  2. Calculate the actual hit rate for each bin
  3. Compare predicted vs. actual frequencies

Good calibration: If markets predict 80% for a group of events, approximately 80% of those events should actually occur.

Poor calibration: Markets that consistently overpredict (saying 80% when events occur 60% of the time) or underpredict (saying 40% when events occur 60% of the time) are poorly calibrated.

Data Collection

We collect prediction market data from two primary sources:

Polymarket

Decentralized prediction market on Polygon. We fetch market data, probabilities, and volume via their public API.

Kalshi

CFTC-regulated US exchange. We access market data through their public API for stock and company-related contracts.

What we store:

  • Market question and description
  • Current probabilities for each outcome
  • Historical probability snapshots at key horizons
  • Volume and liquidity metrics
  • Resolution status and outcome
  • Company ticker and sector classification

Data is refreshed regularly to capture current market state and historical snapshots for accuracy analysis.

Limitations & Caveats

Important considerations when interpreting our accuracy data:

  • Selection bias: We only track stock and company-related markets. Accuracy for other market types (politics, sports) may differ.
  • Sample size: Some categories have limited resolved markets. Take statistics with small sample sizes (n < 10) with caution.
  • Market liquidity: Low-volume markets may have less accurate prices due to fewer participants correcting mispricing.
  • Time periods: Our data covers recent market history. Long-term accuracy trends require more time to establish.
  • Resolution criteria: How markets define "resolved" can vary. We use the official resolution from each platform.
  • Not financial advice: This data is for informational purposes. Past accuracy does not guarantee future performance.