What is a Brier Score?

The Brier Score measures the accuracy of probabilistic predictions. It is calculated as (prediction - outcome)². A score of 0 is perfect, 0.25 is random guessing, and 1 is always wrong.

What time horizons do you measure?

We measure predictions at 30 days, 14 days, 7 days, 1 day, and 12 hours before market close. Our headline score uses the 30-day horizon.

Methodology

Q: What is calibration in prediction markets?

Calibration measures whether predicted probabilities match observed frequencies. A well-calibrated forecaster has their 70% predictions come true about 70% of the time.

How we measure and report prediction market accuracy. This page explains our scoring system, time horizons, and data collection process.

What this page contains

• Brier Score - Our primary accuracy metric
• Time Horizons - When we measure predictions
• Calibration - How well probabilities match reality
• Data Collection - How we gather market data
• Limitations - Important caveats to consider

Brier Score

The Brier Score measures the accuracy of probabilistic predictions. It's calculated as the squared difference between the predicted probability and the actual outcome.

Formula:

Brier Score = (prediction - outcome)²

Where outcome = 1 if event occurred, 0 if not

Interpretation:

0.00 = Perfect prediction (predicted exactly what happened)
0.25 = Random guessing (50/50 on every prediction)
1.00 = Always completely wrong

Example:

Market: "Will AAPL beat Q4 earnings?"

Prediction 30 days before: 72% Yes

Actual outcome: Yes (AAPL beat earnings)

Brier Score = (0.72 - 1)² = 0.078

This is a good score - the market was confident and correct.

We report Brier scores averaged across all resolved markets, broken down by source, sector, company, and time horizon.

Time Horizons

We measure market predictions at multiple time points before resolution. This shows how accuracy improves as the event approaches.

Horizon	Description	Use Case
30 days	1 month before close	Early forecast quality
14 days	2 weeks before close	Medium-term accuracy
7 days	1 week before close	Short-term accuracy
1 day	24 hours before close	Near-term precision
12 hours	12 hours before close	Final forecast quality

Our headline Brier score uses the 30-day horizon (or the earliest available if the market existed for less than 30 days). This prevents gaming the metric by only measuring when the answer is nearly known.

Calibration

Calibration measures whether predicted probabilities match observed frequencies. A well-calibrated forecaster should have their 70% predictions come true about 70% of the time.

How we assess calibration:

Group predictions into probability bins (0-10%, 10-20%, etc.)
Calculate the actual hit rate for each bin
Compare predicted vs. actual frequencies

Good calibration: If markets predict 80% for a group of events, approximately 80% of those events should actually occur.

Poor calibration: Markets that consistently overpredict (saying 80% when events occur 60% of the time) or underpredict (saying 40% when events occur 60% of the time) are poorly calibrated.

Data Collection

We collect prediction market data from two primary sources:

Polymarket

Decentralized prediction market on Polygon. We fetch market data, probabilities, and volume via their public API.

Kalshi

CFTC-regulated US exchange. We access market data through their public API for stock and company-related contracts.

What we store:

Market question and description
Current probabilities for each outcome
Historical probability snapshots at key horizons
Volume and liquidity metrics
Resolution status and outcome
Company ticker and sector classification

Data is refreshed regularly to capture current market state and historical snapshots for accuracy analysis.

Limitations & Caveats

Important considerations when interpreting our accuracy data:

Selection bias: We only track stock and company-related markets. Accuracy for other market types (politics, sports) may differ.
Sample size: Some categories have limited resolved markets. Take statistics with small sample sizes (n < 10) with caution.
Market liquidity: Low-volume markets may have less accurate prices due to fewer participants correcting mispricing.
Time periods: Our data covers recent market history. Long-term accuracy trends require more time to establish.
Resolution criteria: How markets define "resolved" can vary. We use the official resolution from each platform.
Not financial advice: This data is for informational purposes. Past accuracy does not guarantee future performance.

View accuracy analytics →Glossary of terms →