Enter your email address below and subscribe to our newsletter

Prediction: Machine Learning picks Norris to win the 2025 Qatar GP

Using Machine Learning and official F1 telemetry, we ran the numbers on the 2025 Qatar Grand Prix before the lights went out.

Predicting a Formula 1 race is notoriously difficult. Strategy calls, safety cars, and first-lap chaos can turn the expected order upside down within seconds. But underneath all that noise, there is signal, patterns in the data that tell you, before a single lap is run, who genuinely has the pace to win. We built a machine learning model using official F1 telemetry to find that signal. Here is what it told us about this weekend in Qatar.

The Predicted Podium

P1: Lando Norris — McLaren

P2: Oscar Piastri — McLaren

P3: Max Verstappen — Red Bull

McLaren lock out the top two. The MCL39 has been the class of the field in race trim throughout the second half of 2025, and Lusail’s long, high-speed corners suit it perfectly. Verstappen completes the podium, even in a year where Red Bull has fallen behind, Max consistently extracts more from his car than anyone else on the grid.

How The Model Works
The Data

The foundation is FastF1, an open-source Python library licensed under the MIT License that pulls official Formula 1 timing data. The same feed the teams themselves use at the circuit. From the 2024 Qatar Grand Prix, we load every lap driven by every driver. Before any of it reaches the model, two cleaning steps run automatically.

First, a track status filter keeps only laps run under green flag conditions. Safety car laps and yellow flag periods are discarded entirely. Those artificially slow times would corrupt the pace data and skew the model’s understanding of each driver’s true speed.

Second, an IQR outlier filter removes laps that fall unusually far outside each driver’s normal range. This automatically strips in-laps and out-laps; the slow laps entering and leaving the pits; without any manual work. What remains is a clean dataset of laps that genuinely represent race pace.

The Five Inputs

With clean data in hand, the model builds a profile for each driver using five inputs.

Qualifying Time turned out to be the single most influential factor in the model, and the feature importance chart makes this striking. At Lusail, where overtaking opportunities are limited and track position compounds over the course of a stint, where you start is a powerful predictor of where you finish. The model learned this entirely from the data, without being told.

Clean Air Race Pace measures how fast each driver laps when running freely, away from traffic. This is the truest measure of outright speed; not distorted by a slow car ahead or a charging driver behind. Calculated from 2025 race stint data, it tells us what each driver is genuinely capable of on a clean lap in race conditions.

Team Performance Score encodes the gap in raw machinery. Built from 2025 constructor championship points and normalised against the leader, it gives every driver a score between 0 and 1 representing how competitive their car is — from McLaren at the top to Alpine at the bottom.

Rain Probability and Temperature come from a live weather forecast pulled from the OpenWeatherMap API for race day at Lusail. Qatar in December is almost always dry and warm — and this weekend is no different. Both factors register near zero importance in the model this weekend, which is exactly what you would expect.

The Algorithm

All five inputs are fed into XGBoost (Extreme Gradient Boosting), one of the most battle-tested machine learning algorithms in the world, used across finance, medicine, and sports analytics. It learns the relationship between driver and team characteristics and expected race pace, then ranks every driver accordingly.

Three design choices keep the model honest. A low learning rate of 0.05 with 500 decision trees means it learns slowly and carefully, avoiding the overfitting that plagues models trained on small datasets. Regularization penalizes unnecessary complexity. Monotone constraints enforce physical logic, a better qualifying time must always produce a better predicted result, and stronger team performance must always help. Without these constraints, the model could find patterns in the data that defy basic physics.

Validation is done using Leave-One-Out Cross-Validation: train on all drivers except one, predict that driver, repeat for every driver in turn. The model’s average prediction error across all drivers came out at 3.60 seconds per lap, meaningful signal for separating the front runners from the field.

The chart tells a clear story. Qualifying time dominates the model learned from the data that at Lusail, where you start is the most powerful single predictor of where you finish. Team performance score is second, reflecting the genuine gap in machinery across the grid. Clean air pace, rain, and temperature play supporting roles.

Race data sourced via FastF1 by Tobias Oehrly, licensed under the MIT License. Prediction model built in Python using XGBoost and scikit-learn. Copyright notice: Permission is hereby granted, free of charge, to any person obtaining a copy of this software, to deal in the Software without restriction, including the rights to use, copy, modify, merge, publish, and distribute, subject to the condition that the above copyright notice and this permission notice are included in all copies or substantial portions of the Software.

Leave a Reply

Discover more from The Prix Report

Subscribe now to keep reading and get access to the full archive.

Continue reading