One interface for a homelab that grew teeth

The honest origin of Perseus is that I was tired of SSH-ing into five machines and juggling terminals to answer questions I asked the homelab constantly. Is the GPU box up. What is Ollama loaded with. Did that cron job fail again. Every one of those is a known command on a known host, and every one of them cost me a context switch.

Perseus is the layer that collapses that. I send a natural language message in Discord, a router classifies the intent, and the right one of twenty agents handles it. One interface, one history of what ran and what it cost. The Discord part means I can check a failing job from my phone without opening a terminal, which sounds minor until it is eleven at night and you are in bed.

The router is deliberately dumb

Every message flows through an intent classifier before anything else. The classifier is a fast keyword scan over the lowercased command, not an LLM call. That was a design decision I would defend in any review: routing itself costs nothing, adds no latency, and never hallucinates. Spending a model call just to decide which model to call is the kind of cleverness that looks good in a diagram and bleeds money in production.

If the keyword scan does not match, the message falls through to a general-purpose brain. The simple path is fast and free. The expensive path only runs when the cheap one cannot answer.

Backends matched to workloads

The split across three LLM backends follows the work, not the hype. Latency-sensitive, high-volume tasks like content curation and trend analysis go to a local Ollama model at zero marginal cost. Tasks that genuinely need reasoning, like a compliance risk assessment, hit GPT-4o-mini. Strategic analysis goes to Grok.

Every request that touches a paid backend computes its cost from the token usage in the response and logs it to PostgreSQL against a monthly ceiling. The system tells me when I am approaching the limit before I get there. I have never had a surprise API bill from Perseus, and that is not luck. It is the cost tracking doing its job.

The tier that decides what runs

Letting an AI system run shell commands on real infrastructure is where this could have gone badly, so the SSH layer has three tiers and no exceptions. A blocklist of destructive operations like rm -rf /, dd if=/dev/zero, and shutdown is rejected outright, no approval available. A read-only auto-approve list of ps, nvidia-smi, df, systemctl status, and similar runs immediately. Everything else parks in a pending queue until I send an explicit approval.

The default for an unrecognized command is wait, not run. That default is the whole safety model. A system that errs toward asking is a system I can hand more responsibility to over time. One that errs toward acting is one I would have to babysit, which would defeat the point of building it. Hosts resolve by role, so I issue commands to gpu or vpn rather than memorizing IPs, and the same map feeds a health monitor that checks every node in parallel and alerts when a threshold trips.