← THE INDEX  ·  QUANT

Hermes

A fully autonomous prediction-market bot that converts NWS ensemble forecasts into calibrated bracket probabilities and sized bets on Kalshi.

Private repository. Source code is not public. This page covers the architecture and engineering decisions. No API keys, credentials, or trading account details appear anywhere in this writeup.

What it is

Hermes is a single-file Python monolith (~1,900 lines) that runs as a systemd service on a Linux server. It polls Kalshi for open weather markets across five US cities, fetches a 31-member NWS ensemble forecast from Open-Meteo, computes bracket-hit probabilities using a Gaussian model fit to the ensemble spread, sizes positions with a dampened Kelly criterion, and optionally routes borderline bets through a Claude Sonnet veto gate before placing orders via the Kalshi REST API.

A Discord bot (Hermes#6760) doubles as a command interface and audit log: you can check live positions, trigger a manual scan, flip auto-trading on or off, and see the reasoning behind any position. All from a phone.

The probability model

The core of the system is ensemble_probability(). It fetches 31 forecast members from the Open-Meteo ensemble API, fits a Gaussian to the member distribution (sample mean and variance), and evaluates P(lo ≤ temp ≤ hi) as the CDF integral over the bracket. Raw counting with 31 members is too coarse for narrow (1-2°F) brackets; you get hard zeros where the true probability might be 5-15%. The Gaussian fit smooths this.

After an April 2026 calibration audit found the bot betting at 3% confidence on events that hit ~56% of the time, an MAE sigma floor was added: the effective sigma is max(ensemble_sigma, mae * sqrt(pi/2)). This prevents false precision when the ensemble members agree tightly but empirical NWS forecast error for that city is substantially larger. City-specific MAE values (2.0°F for Miami, 4.5°F for Denver) are tuned from historical data. Final output is clamped to [0.10, 0.90]. Weather forecasts 1-2 days out do not support greater than 90% confidence on a binary threshold crossing.

main.py: ensemble probability with MAE sigma floor (post-April 2026 calibration)
def ensemble_probability(ensemble_highs, threshold_f, direction="above",
                         bracket_bounds=None, city_mae=None):
    n = len(ensemble_highs)
    if n < 2:
        return 0.5  # insufficient data: maximally uncertain

    mean = sum(ensemble_highs) / n
    variance = sum((t - mean) ** 2 for t in ensemble_highs) / (n - 1)
    ensemble_sigma = math.sqrt(variance) if variance > 0 else 0.0

    # MAE floor: never trust ensemble tightness below empirical NWS error.
    # Pre-Apr-21: unclamped sigma caused 3%-confidence bets on 56%-hit events.
    mae = city_mae if (city_mae and city_mae > 0) else DEFAULT_CITY_MAE_FALLBACK
    mae_sigma = mae * math.sqrt(math.pi / 2.0)
    sigma = max(ensemble_sigma, mae_sigma)

    if direction == "above":
        z = (threshold_f - mean) / sigma
        cdf = 0.5 * (1.0 + math.erf(z / math.sqrt(2)))
        prob = 1.0 - cdf
    elif direction == "bracket":
        lo, hi = bracket_bounds
        z_lo = (lo - mean) / sigma
        z_hi = (hi - mean) / sigma
        cdf_lo = 0.5 * (1.0 + math.erf(z_lo / math.sqrt(2)))
        cdf_hi = 0.5 * (1.0 + math.erf(z_hi / math.sqrt(2)))
        prob = max(0.0, cdf_hi - cdf_lo)
    else:
        return 0.0

    # [0.10, 0.90] clamp: weather 1-2 days out does not support >90% confidence.
    return max(0.10, min(0.90, prob))

Kelly sizing, guardrails, and the veto gate

Position sizing uses a fractional Kelly formula with a per-confidence-level dampener (0.125×, 0.25×, or 0.375× of full Kelly) and a hard cap at 6% of bankroll per trade (3% in survival mode when bankroll drops below $30). The Kalshi taker fee is incorporated into the effective payout before computing Kelly f.

For bets where the model's edge falls in a middle band (configurable SONNET_VETO_EDGE_LOW to SONNET_VETO_EDGE_HIGH), the decision is routed to Claude Sonnet with a structured prompt covering city, market type, ensemble probability, NWS forecast, market price, and reasoning context. Sonnet returns an approve/reject verdict with a clamped probability adjustment. Results are cached by ticker for 30 minutes to avoid repeated API calls during a scan cycle.

Additional guards: per-city-date position limit (max 2 open positions), total portfolio exposure cap (25%), daily trade limit (6), and a drawdown halt. A data-driven audit in April 2026 permanently disabled LA, Denver, and Miami high-temp markets (all net-negative across their history) and LOWT bracket markets (41% win rate).

What the audit found

The bot was halted in May 2026 after a sustained drawdown. The post-mortem (see the companion prediction-market-bot-postmortem repository) traced 100% of the lifetime loss to the pre-April 21 period before the MAE sigma floor was added. The strategy was targeting single-degree (1-2°F) temperature brackets: markets that hit ~45% of the time with a realized reward:risk ratio of 0.51, requiring a 66% win rate to be profitable. No probability model can fix a market with no edge. After the sigma floor, the bot stopped losing but also stopped winning. Breakeven, not profitable. The strategy was retired.