About — FIFA WC 2026 Predictor

Training dataset

49,000+

INTERNATIONAL FIXTURES ANALYSED

World CupQualifiersContinental ChampionshipsNations LeagueFriendlies

How It Works

From raw match history to calibrated predictions — the full 6-step pipeline.

01Data Ingestion49,000+ international fixtures from 1993–2026, spanning WC, qualifiers, continental championships, and friendlies.

02Feature EngineeringElo ratings, rolling form windows, H2H records, defence ratings, rest days, and contextual flags built pre-match.

03Chronological SplitTrain / val / test split preserving time order. No future information leaks into training features.

04Model TrainingXGBoost, Logistic Regression, and MLP trained independently on identical feature vectors with sample weighting.

05Ensemble BlendingPer-class SLSQP weights optimised on the validation set. A dedicated draw submodel corrects systematic underestimation.

06CalibrationPer-class OvR isotonic calibrators ensure stated 60% confidence is empirically correct ~60% of the time.

Ensemble Architecture

XGBoostGradient-boosted decision trees. Handles non-linear interactions and class imbalance. The dominant backbone of the ensemble.

Logistic RegressionLinear baseline classifier. Highly interpretable, provides stable, well-calibrated probability estimates as an anchor.

MLPMulti-layer perceptron. Captures deep feature interactions and complex patterns that tree-based models miss.

SLSQP optimised blending + draw submodel

Ensemble Model

Per-class weights optimised on the held-out validation set. Draw probability corrected via a dedicated binary logistic submodel. Output isotonically calibrated per class.

Monte Carlo Simulation

We run the full 48-team tournament 5,000 times. Each run draws match outcomes from the model's probability distributions and tracks every team's path. Final win chances are frequencies across all runs.

5,000Simulations per requestFull tournament, group → final

520K+Match outcomes modelled5,000 runs × 104 matches

Champion probability · top 3

Loading simulation…

Backtest Accuracy

Chronological backtest · 9,167-match held-out test set · Isotonic calibration applied · 3-class H / D / A

60%Test AccuracyRandom baseline ~33%Fraction of matches where the model correctly predicted the outcome (H / D / A).

0.513Brier ScoreLower is betterMeasures overall probabilistic accuracy. Perfect = 0, random baseline ~0.64.

0.87Log LossLower is betterPenalises confident wrong predictions. Random baseline ~1.06.

Feature Importance (SHAP)

Mean absolute SHAP value across 161 test samples — what actually drives the prediction.

Dominant Signal

Elo Win Probability outweighs every other feature by ~8× (SHAP 0.196 vs 0.022 for the next feature). Defence ratings are the next meaningful signal. FIFA rank, streaks, and competition category are near-zero contributors.

Elo Win Probability

0.196

Elo Difference

0.022

H2H Goal Difference

0.020

Effective Elo Diff

0.015

Home Defense Rating

0.014

Away Defense Rating

0.012

H2H Match Count

0.010

Home Adj. Defense

0.009

Neutral Venue

0.009

Away Elo Rating

0.009

Match Data

49,000+

Historical fixtures

International results from 1993–2026 spanning World Cups, qualifiers, continental championships, and friendlies.

football-data.org

Live Squads

37 / 48

Squads announced

Player names, ages, clubs, and portraits updated as each nation announces their final 26-man squad.

SofaScoreESPNFotMob

What Each Output Means

Win Probability

P(home win), P(draw), P(away win) from the ensemble model. The three values always sum to 1.0.

Predicted Score

Most-likely scoreline from a team-dependent Poisson model whose expected goals are calibrated to reproduce the ensemble's exact outcome probabilities.

Expected Goals (xG)

The Poisson lambda — the mean goals each team is expected to score. Fractional is normal: xG 1.4 means between 1 and 2 goals on average.

Confidence

The highest of the three outcome probabilities. Above ~55% is considered meaningful; below 40% the match is too close to call reliably.

Limitations

01Squad injuries and suspensions are not modelled — the system uses historical team-level performance only.
02Individual player ratings are not used. A squad-strength feature is planned but not yet implemented.
03Predictions are probabilistic — upsets are expected and statistically normal. A 75% favourite still loses 1 in 4 times.
04The model trains on matches from 1993 onwards. Very new national team programs may have limited training history.

Tech Stack

ML & Data

PythonXGBoostscikit-learnSciPyNumPypandasSHAP

API

FastAPIUvicornPydanticjoblib

Frontend

Next.js 14TypeScriptTailwind CSSFramer MotionRecharts

Built by