Fitting a parabola to a video game pitch
A computer-vision accessibility tool that watches a capture-card feed, predicts where the pitch crosses the plate, and nudges the controller. Built for offline play only, on purpose.
A computer-vision accessibility tool that watches a capture-card feed, predicts where the pitch crosses the plate, and nudges the controller. Built for offline play only, on purpose.
Zone hitting in MLB The Show asks for fast, precise left-stick corrections as the pitch comes in. For a player with a motor disability that makes small stick movements unreliable, that interface is not a difficulty setting. It is a wall. You either make those inputs or you do not get to play the game. I built this to take the wall down for offline play.
The shape of the system is three processes talking over ZMQ on localhost. ingest.py reads the capture card and publishes raw frames. Two trackers subscribe and emit detection events. bridge.py subscribes to both, computes the aim error, and writes a command frame to a Titan Two adapter, which applies stick input to the controller passthrough. Splitting it this way means a slow tracker cannot stall frame capture, and I can reason about each stage on its own.
A pitch gives you a fraction of a second. Every millisecond the pipeline spends is a millisecond the prediction is stale. Mean read latency at 1080p60 came in at 16.3 ms, with a p95 of 17.4 ms, and I measured that rather than guessing because the whole design lives or dies on it.
The subscribers use CONFLATE=1, so each worker always processes the newest frame and never falls behind under load. That one flag was the difference between a tracker that drifts further behind reality the longer it runs and one that stays current. When you are predicting the future, processing a stale frame is worse than processing no frame.
Once the rolling trail has enough points spanning enough vertical distance, the tracker fits two independent polynomials against wall-clock time. Y gets a degree-2 fit, because the apparent drop is parabolic from gravity rendering and camera perspective. X gets a degree-1 fit, because lateral drift is close to linear for in-game pitches.
Then it solves the y polynomial for the plate crossing time, taking the forward root of the quadratic, and evaluates x at that time. The output is a plate position and an eta in milliseconds. The bridge uses the eta to schedule the stick input early, accounting for the adapter's processing and the controller's USB poll, so the deflection lands when the ball arrives rather than after. Fits that project a crossing more than two seconds out, or in the past, get thrown away as noise. Most of the engineering effort went into deciding what to discard, not what to trust.
The classical HSV-and-circularity detector handles most frames. A YOLO model trained on real gameplay captures is the fallback for hard lighting, because HDR tone mapping washes the ball's color range out and the classical mask loses it. A static-detection ban zone suppresses UI elements: any cluster of detections that barely moves gets added to a timed exclusion list, so a stationary on-screen graphic never gets mistaken for a pitch.
The constraint I care about most is not in the CV at all. The tool runs exclusively against CPU difficulty in offline modes and disarms the instant it detects any online-mode UI element. Capture loss, a menu, or an F12 keypress all immediately return full passthrough to the controller. The license adds an explicit clause forbidding online or multiplayer use. An accessibility tool that could be pointed at other people is not an accessibility tool. The safeguards are the reason this project exists in the form it does.