Methodology
How we measure and report prediction market accuracy. This page explains our scoring system, time horizons, and data collection process.
What this page contains
- • Brier Score - Our primary accuracy metric
- • Time Horizons - When we measure predictions
- • Calibration - How well probabilities match reality
- • Data Collection - How we gather market data
- • Limitations - Important caveats to consider
Brier Score
The Brier Score measures the accuracy of probabilistic predictions. It's calculated as the squared difference between the predicted probability and the actual outcome.
Formula:
Brier Score = (prediction - outcome)²
Where outcome = 1 if event occurred, 0 if not
Interpretation:
- 0.00 = Perfect prediction (predicted exactly what happened)
- 0.25 = Random guessing (50/50 on every prediction)
- 1.00 = Always completely wrong
Example:
Market: "Will AAPL beat Q4 earnings?"
Prediction 30 days before: 72% Yes
Actual outcome: Yes (AAPL beat earnings)
Brier Score = (0.72 - 1)² = 0.078
This is a good score - the market was confident and correct.
We report Brier scores averaged across all resolved markets, broken down by source, sector, company, and time horizon.
Time Horizons
We measure market predictions at multiple time points before resolution. This shows how accuracy improves as the event approaches.
| Horizon | Description | Use Case |
|---|---|---|
| 30 days | 1 month before close | Early forecast quality |
| 14 days | 2 weeks before close | Medium-term accuracy |
| 7 days | 1 week before close | Short-term accuracy |
| 1 day | 24 hours before close | Near-term precision |
| 12 hours | 12 hours before close | Final forecast quality |
Our headline Brier score uses the 30-day horizon (or the earliest available if the market existed for less than 30 days). This prevents gaming the metric by only measuring when the answer is nearly known.
Calibration
Calibration measures whether predicted probabilities match observed frequencies. A well-calibrated forecaster should have their 70% predictions come true about 70% of the time.
How we assess calibration:
- Group predictions into probability bins (0-10%, 10-20%, etc.)
- Calculate the actual hit rate for each bin
- Compare predicted vs. actual frequencies
Good calibration: If markets predict 80% for a group of events, approximately 80% of those events should actually occur.
Poor calibration: Markets that consistently overpredict (saying 80% when events occur 60% of the time) or underpredict (saying 40% when events occur 60% of the time) are poorly calibrated.
Data Collection
We collect prediction market data from two primary sources:
Polymarket
Decentralized prediction market on Polygon. We fetch market data, probabilities, and volume via their public API.
Kalshi
CFTC-regulated US exchange. We access market data through their public API for stock and company-related contracts.
What we store:
- Market question and description
- Current probabilities for each outcome
- Historical probability snapshots at key horizons
- Volume and liquidity metrics
- Resolution status and outcome
- Company ticker and sector classification
Data is refreshed regularly to capture current market state and historical snapshots for accuracy analysis.
Limitations & Caveats
Important considerations when interpreting our accuracy data:
- Selection bias: We only track stock and company-related markets. Accuracy for other market types (politics, sports) may differ.
- Sample size: Some categories have limited resolved markets. Take statistics with small sample sizes (n < 10) with caution.
- Market liquidity: Low-volume markets may have less accurate prices due to fewer participants correcting mispricing.
- Time periods: Our data covers recent market history. Long-term accuracy trends require more time to establish.
- Resolution criteria: How markets define "resolved" can vary. We use the official resolution from each platform.
- Not financial advice: This data is for informational purposes. Past accuracy does not guarantee future performance.