A standalone live status board at status.busymate.net. It probes every Busymate service — Dashboard, Backend (Supabase), MCP, Proxy, Docs, and itself — and renders overall health, per-service detail, VPS host metrics, a Proxy Security card (open-relay flood + the #001 source-IP gate), and a Supabase Project card (per-service health for our exact Cloud Pro project). It is a separate Next.js + Turbopack + shadcn/ui app (React 19, dark-only) with its own systemd unit (busymate-status), its own port (:3940), and its own deploy.
For the operator-facing task guide (reading the board, local dev, deploy), see Status board — How-to.
Why it's a separate component
A status page embedded in the thing it monitors goes down exactly when you need it. If the board lived inside the dashboard, a dashboard outage would take the status page down with it — and the page exists precisely to tell you the dashboard is down.
So the board is deliberately decoupled:
- Its own Next.js app, systemd unit, port, and nginx vhost — nothing it imports from the dashboard, no shared process.
- Its health JSON at
GET /api/statusis unauthenticated and returns coarse health only, never secrets. Auth being down is one of the failure modes it has to survive, so it can't depend on auth to render. - It runs on the VPS it reports on, which is what makes a real status page possible without a separate monitoring agent: host load, memory, and uptime come straight from Node's
os.*, and systemd state comes fromsystemctl showon the same box.
The only thing it shares with the rest of the monorepo is the repo-root version.json (read off disk for build numbers) and the optional Supabase ingest endpoint (fire-and-forget, see below) — neither is on the critical render path.
Topology
The probe model
Every monitored service is a Component in a static TARGETS array in web/status/app/api/status/route.ts. A collect() pass probes all six in parallel (Promise.all) and assembles a StatusBody.
Each target carries metadata plus tuning flags:
| Field | Meaning |
|---|---|
key | Stable identity (dashboard, supabase, mcp, proxy-server, docs, status). Also the lookup key into version.json for the build number. |
label / role | Display name + one-line description. |
host | Public hostname (rendered as the "open" link). |
url | What probe() fetches. For internal services this is loopback (http://127.0.0.1:3838/api/version) to avoid bouncing through nginx. |
critical | A core service. Dashboard and Backend are core; everything else is non-core. |
unit | systemd unit name for systemctl show, or null for cloud-hosted services (Backend, MCP). |
self | This very board — it's serving the request, so it's trivially up (no self-fetch). |
wantBuild | Whether to parse build from the probe's JSON body (only the dashboard's /api/version returns one). |
upWhen | Custom up/down predicate. Default is status < 500; MCP uses s > 0 (any answer at all proves the edge function is reachable). |
metricsUrl | Loopback flood/abuse metrics URL — proxy only (http://127.0.0.1:8888/metrics). Fetched alongside the main probe and attached as probe.flood; stripped from the serialised body (it's an internal 127.0.0.1 URL). Drives the Proxy Security card. |
probe(t):
self:trueshort-circuits to{ up: true, httpStatus: 200, latencyMs: 0 }.- Otherwise
fetch(url)with a 4.5 s timeout (AbortSignal.timeout),redirect: "manual",cache: "no-store", and abusymate-status/1user-agent. - Latency is
performance.now()delta in ms. - If
wantBuild, the response is cloned and parsed for a numericbuildfield. - If
metricsUrlis set (proxy only),fetchFloodMetrics()best-effort-fetches the loopback/metrics(2.5 s timeout) and attaches it asprobe.flood. Any failure (proxy down, old build without/metrics, malformed body) yieldsnulland never affects the up/down verdict. up=upWhen(status)if provided, elsestatus < 500.- Any throw (timeout, connection refused) returns
{ up: false, httpStatus: null, error }—TimeoutErroris normalised to"timeout".
Liveness probes are picked to be cheap and auth-free. Backend hits https://api.busymate.net/auth/v1/health — any non-5xx (200, or a 401 when the edge is picky about the apikey header) proves the auth API is up. Proxy hits /ca/bundle. Docs hits /. MCP hits / and treats any status as up.
systemd uptime
For targets with a unit, unitState(unit) runs systemctl show <unit> --property=ActiveState,SubState,ActiveEnterTimestamp (2.5 s timeout, read-only). ActiveEnterTimestamp is parsed into uptimeSec. If systemctl isn't present (local dev on a Mac), it returns { active: "unknown", … } — and the client treats "unknown" as OK, not a failure, so the board renders cleanly off-VPS.
Build numbers
manifestBuilds() reads version.json from ../version.json or ../../version.json relative to process.cwd(), and maps components.<key>.build. The dashboard's live build comes from its own /api/version probe; for the others the manifest value is the source. The board therefore shows the deployed build per component, surfacing drift at a glance.
Overall status
After the parallel collect(), the overall verdict is computed:
down— any core service is down (critical && !up). Backend or Dashboard being unreachable is a major outage.degraded— some non-core service is down but all core services are up.operational— everything responding.
The client maps these to "All systems operational" (emerald), "Partial degradation" (amber), and "Major outage" (red), with a pulsing dot for the operational state.
A per-service card is also considered unhealthy if its systemd unit reports a non-
active/non-unknownstate, even when the HTTP probe succeeded — a process can answer while its unit isfailedmid-restart.
Proxy Security card
Because the status app runs on the same VPS as the proxy, it can read the proxy's open-relay-abuse counters off loopback for free. The proxy target carries metricsUrl: "http://127.0.0.1:8888/metrics"; fetchFloodMetrics() fetches it during the proxy probe and attaches the result as probe.flood (a FloodMetrics mirroring FloodStats.snapshot() in web/proxy-server/src/floodStats.ts). The client renders a dedicated Proxy Security card directly below the Host card and above the Supabase Project card — but only when flood is non-null, so an old proxy build without /metrics simply omits the card.
The card surfaces the issue-#001 source-IP gate: the gate gateMode badge (shadow = logging only, amber; enforce = refusing un-bound source IPs, emerald), an under attack / clean verdict from underAttack, and a metric grid of last-60 s abuse counters (abuse/min, distinct abusive IPs, refused = denied + noBinding, cap-drops), live pool pressure (active conns, throttled IPs, top-abuser conns), and devices/knownIps gated, plus a lifetime allowed · refused · cap-dropped footer. fetchFloodMetrics() is fully best-effort: a 2.5 s timeout, a body-shape sanity check (gateMode string + last60s + pool present), and any error returns null.
Supabase Project card
The Backend probe (/auth/v1/health) only answers up/down. The Supabase Project card adds the detail: it reports our exact Cloud Pro project (SUPABASE_PROJECT_REF, default xfjplaganjqowkcnznbr), not generic upstream Supabase health. collectSupabase() assembles a SupabaseInfo from three server-side fetches run in parallel, all stripped of secrets before serialising:
- Anonymous version probes (
fetchSupabaseVersions) — GoTrue/auth/v1/health(sent with the browser-safe publishableapikey) and Storage/storage/v1/version. Both answer unauthenticated, double as the latency probe (versionLatencyMs), and prove Auth + Storage are live even with no token. - Management API per-service health (
fetchSupabaseServiceHealth) —GET https://api.supabase.com/v1/projects/{ref}/health?services=…fordb · rest · auth · realtime · storage · pooler. Token-gated: returnsnull(and the four db/rest/realtime/pooler rows render muted asneeds token) unlessSUPABASE_ACCESS_TOKENis set. - Project meta (
fetchSupabaseProject) —GET /v1/projects/{ref}for project status (ACTIVE_HEALTHY, …), region, and Postgres version. Also token-gated.
Each service row merges the authoritative Management-API status (source: "management") with the anon version string where we have one; with no token, Auth/Storage fall back to the anon liveness verdict and the rest report healthy: null (unknown, rendered muted — never red).
Region comes from the Management API (project.region); the SUPABASE_REGION env/default (eu-west-1) is only the no-token fallback. Our project runs in eu-west-1 (Ireland) — intentionally not the lon1/London VPS region, so the default was corrected to match reality.
Client-side Statuspage fetch
Supabase's platform-wide public Statuspage (status.supabase.com/api/v2/summary.json) is fetched from the viewer's browser, never the server — AWS WAF CAPTCHA-blocks our VPS datacenter IP, but the summary API is CORS-open (access-control-allow-origin: *), so each visitor's own residential/office IP fetches it cleanly. The useSupabasePlatform hook polls it every 60 s (paused when the tab is hidden), and the card shows only the slices that name our region: the matching compute-capacity dot, plus any unresolved incidents or scheduled maintenance affecting our region. A failed fetch leaves the platform overlay null and the card still renders the server-probed versions.
The Management API token itself lives in the VPS supabase.conf systemd drop-in (SUPABASE_ACCESS_TOKEN). Off-VPS or without it, the four token-gated service rows stay muted while Auth + Storage continue reporting live.
/api/status response shape
The route is force-dynamic, runtime: "nodejs", revalidate: 0. The serialised body (StatusBody in web/status/app/api/status/route.ts, mirrored in StatusClient.tsx):
interface StatusBody {
generatedAt: string; // ISO-8601
overall: "operational" | "degraded" | "down";
host: {
hostname: string;
platform: string; // `${os.platform()} ${os.release()}`
uptimeSec: number;
cpus: number;
load: [number, number, number]; // os.loadavg() — 1m, 5m, 15m
loadRatio: number; // load[0] / cpus
memTotal: number; // bytes
memUsed: number; // bytes (memTotal − freemem)
memUsedRatio: number;
};
components: Array<{
key: string;
label: string;
role: string;
host: string;
url: string | null;
critical: boolean;
unit: string | null;
self?: boolean;
probe: {
up: boolean;
httpStatus: number | null;
latencyMs: number | null;
build?: number | null;
error?: string;
flood?: FloodMetrics | null; // proxy only — loopback /metrics snapshot
};
unitState: { active: string; sub: string; uptimeSec: number | null } | null;
}>;
supabase: { // our project's detail (collectSupabase)
projectRef: string;
region: string; // Management API; eu-west-1 fallback
versionLatencyMs: number | null;
projectStatus: string | null; // ACTIVE_HEALTHY | … (null without token)
postgresVersion: string | null; // null without token
hasToken: boolean; // SUPABASE_ACCESS_TOKEN present?
services: Array<{
name: string; // db | rest | auth | realtime | storage | pooler
label: string;
status: string; // ACTIVE_HEALTHY | up | down | unknown
healthy: boolean | null; // null = couldn't determine (no token / no anon probe)
version: string | null; // GoTrue / Storage version where known
source: "management" | "anon";
}>;
};
}The wantBuild, upWhen, and metricsUrl fields are internal tuning knobs — they're stripped before serialisation. FloodMetrics mirrors the proxy's /metrics snapshot (gateMode, underAttack, last60s, pool, lifetime, …) and appears only on the proxy component's probe; it's null when the proxy is unreachable or predates the endpoint. The platform-wide Supabase Statuspage is not in this body — it's fetched client-side (see the Supabase Project card). The response carries Cache-Control: no-store and an X-Status-Cache: hit|miss header so you can tell whether you got the cached snapshot or a fresh collect().
There are no secrets in this body: no tokens, no env, no internal IPs beyond the loopback probe URLs. That's deliberate — the endpoint is public so it survives an auth outage.
Host metrics
Because the app runs on the VPS, host metrics come straight from Node's os module — no agent, no shell scraping:
- Load —
os.loadavg()gives the 1m/5m/15m averages;loadRatiois the 1-minute figure divided byos.cpus().length. The board surfaces it as "% of capacity (1m)". - Memory —
memUsed = os.totalmem() − os.freemem(), rendered as used/total GB and a used-ratio. - Uptime —
os.uptime()(seconds since boot), formatted asXd Yh.
The client renders load and memory as discrete 20-cell meter gauges (Meter in StatusClient.tsx) — pure Tailwind cells, no inline widths. Tone is driven by ratio: emerald below 60%, amber 60–85%, red at or above 85% (ratioTone).
Caching, polling, and the self-heartbeat
Three timers keep the board fresh without hammering the probed services:
- Server cache — 4 s.
GET /api/statusserves a cachedStatusBodyfor up toCACHE_TTL_MS(4000 ms). Concurrent viewers share one snapshot; the probed services see at most onecollect()burst every 4 s regardless of viewer count. - Client poll — 7 s.
StatusClientpolls/api/statuseveryREFRESH_MS(7000 ms), and pauses while the tab is hidden (document.hidden). A separate 1 s ticker only re-renders the "updated Ns ago" relative timestamp; it does not re-fetch. - Self-heartbeat — 10 s. A module-level
setInterval(HEARTBEAT_MS, 10000 ms) recomputes a snapshot and pushes it to the Supabase ingest endpoint even when nobody has the board open, so the stored history never has a gap. It'sunref()-ed (won't keep the event loop alive on its own), guarded by aglobalThisflag against dev-HMR duplicate intervals, and arunninglatch prevents overlapping ticks.
Supabase ingest
Each computed snapshot is also pushed, fire-and-forget, to a Supabase Edge Function so it's queryable over REST/MCP and fanned out over Realtime/WS for any historical or dashboard consumer.
- Where —
STATUS_INGEST_URL(defaults tohttps://api.busymate.net/functions/v1/status-ingest). - Auth — a shared secret in
STATUS_INGEST_SECRET, sent as thex-status-ingest-secretheader. Without the secret the push is skipped silently (one warning), so local dev never errors. - Never blocks —
pushSnapshot()never throws: any failure (timeout, non-2xx, network) is logged and swallowed. TheGEThandler calls it withvoid(un-awaited), so ingest being down can never block or fail a status response. - Two callers — the
GEThandler (on a cache miss) and the 10 s heartbeat both push, so the stored snapshot tracks both real traffic and the no-viewer baseline.
Deploy note: enabling ingest in production means adding STATUS_INGEST_SECRET to the busymate-status systemd drop-in env. See Status board — How-to for the deploy mechanics.
File map
Where to look next
- Status board — How-to — viewing the board, reading the gauges, local dev, deploy.
- Shipping pipeline — where the
statusrow sits in the post-ship version table. web/status/README.md— the component's own readme + first-time VPS provisioning.