AI / ML2025

Weather Market Probabilistic Forecasting

This Ospina case study documents how Carlos Rico-Ospina approached a specific risk, infrastructure, revenue, or research problem and what was built to address it.

Built two probabilistic models that identify conditions where forecast skill beats market consensus—turning calibrated edge into directional trading signals.

Probabilistic ForecastingBayesian InferenceHidden Markov ModelsPrediction MarketsTime-Integrated Brier Score
Weather Market Probabilistic Forecasting

The Problem

Prediction markets platforms offer temperature derivative contracts—binary options on whether daily temperatures exceed thresholds. The market prices reflect crowd consensus, but crowds don't have access to ensemble weather models or sophisticated probabilistic fusion techniques. The question wasn't 'can I beat the market?'—it was 'under what conditions can I beat the market?'

The Insight

Comparing raw prediction accuracy misses the point. What matters is calibrated probabilistic skill—measured by Time-Integrated Brier Score (TIBS). A model doesn't need to be universally better than the market; it needs to identify conditions where it has measurable edge. Those conditions become the trading signal.

What I Built

  • Developed Bayesian posterior fusion model combining GFS, HRRR, and NBM ensemble forecasts with precision-weighted updating
  • Built mixture model treating time of temperature extrema as additional random variable—capturing timing uncertainty, not just value uncertainty
  • Implemented Hidden Markov Model for weather regime classification (stable, transitional, extreme) to dynamically adjust model weights
  • Created backtesting engine measuring TIBS against market-implied probabilities

Outcomes

  • Quantified exactly when and where each model outperformed market consensus using TIBS
  • Identified specific conditions (temperature ranges, weather regimes, forecast horizons) showing consistent alpha
  • Edge conditions inform directional market-making strategy—providing liquidity when model confidence is high
  • Demonstrates proper scoring rule evaluation (calibration + discrimination), not just accuracy

Why It Matters

Standard weather apps give point forecasts. This system outputs calibrated probability distributions that can be compared directly to market prices.

Focused on skill measurement (TIBS) rather than performance chasing—prevents the common trap of overfitting to limited historical data.

Have a Similar Challenge?

Let's discuss how I can help you achieve similar results.

Start a Conversation