decisionshared last reviewed 2026-05-21

Coolify multi-server topology — staging+control vs prod

Context

Aspire's customer-facing apps deploy via Coolify. As the app count grew, running everything on one box created risk: a staging deploy could starve a production app of resources, and the control plane (Coolify itself) shared a host with customer workloads. The decision splits deployment across two servers with clear roles.

Detail

Topology

Server	IP	Role	SSH
Staging + control plane	`112.121.151.46`	Coolify control plane + staging deploys	`ssh coolify-staging` (key `~/.ssh/coolify_vps`)
Production	`103.243.116.249`	Customer-facing prod apps	`ssh coolify-prod` (key `~/.ssh/id_ed25519`, UUID `x9dcph0q2mtwql9illb6umkp`)

Control plane = app.dssc.co.th
SSH aliases live in ~/.ssh/config (since 2026-05-27)
To land a new app on prod: pass server_uuid for prod-runner-1 in the Coolify create call

Why split

Risk on single box	Mitigated by split
Staging build starves prod app	Separate hosts, separate resources
Control plane outage takes down apps	Control plane on staging box; prod apps keep serving
Resource contention during concurrent deploys	Builds happen on staging-side; prod just runs

Companion: OpenClaw VPS

Note this is SEPARATE from the OpenClaw VPS (112.121.151.239) which hosts the AI agent fleet + Knowledge OS — see openclaw-vps-architecture. Three distinct servers total:

112.121.151.46 — Coolify staging + control
103.243.116.249 — Coolify production
112.121.151.239 — OpenClaw AI infrastructure

Coolify deploy gotchas (hard-won, per MEMORY.md `reference_coolify_gotchas`)

28 documented quirks. The ones that bite most often:

#	Quirk	Workaround
1	`ports_exposes` API param silently rejected (defaults to 80)	Encode port in FQDN as `https://host:4000`
2	`update_application` doesn't regenerate Traefik labels	Delete + recreate to change port routing
3	`limits_memory` bytes bug	Watch the unit
—	dockercompose default path `.yaml` not `.yml`	Name files `.yaml`
21-23	helper-prep race / container-recycle race / helper-killed-mid-build	"Wait until in_progress queue EMPTY, then retry on idle VPS"
24	GitLab PAT in `~/.claude.json` rotates mid-session	Re-read on 401
25	Coolify auto-cancels queued >2h	Retry
26	`force=true` amplifies contention	Avoid during busy periods
28	SvelteKit-proxies-Go-API needs explicit `/api/*` proxy routes	Add them

Build-time env injection gotcha (per MEMORY.md `feedback_coolify_env_add_race`)

Adding envs while a deploy is queued triggers duplicate dispatch → docker-stops the helper mid-build
Inter-stage COPY node_modules OOMs on busy VPS → collapse to single stage
NODE_ENV=production as buildtime env breaks npm ci (skips devDeps) → force NODE_ENV=development in deps stage

Build log access (per MEMORY.md `reference_coolify_build_logs`)

GET /api/v1/deployments/applications/{uuid}?take=N returns logs as JSON-string
--no-cache means every deploy is a full rebuild
Deployments list JSON truncates ~2KB

Image-size killer (per MEMORY.md `feedback_docker_image_size`)

Single-stage with full devDeps = export >300s = "context deadline exceeded"
Use multi-stage + npm prune (but watch the 2-stage pnpm prune --prod OOM — needs 3-stage)
Drizzle: Coolify CMD must use node scripts/migrate.mjs, NEVER drizzle-kit migrate at runtime (prod runner strips devDeps)

Constraints we accepted

Two Coolify servers to patch/monitor instead of one
Cross-server deploys need explicit server_uuid targeting

Revisit trigger

App count exceeds what 2 servers handle comfortably → add a third runner
Need for geo-distributed deploys (e.g., AU + Asia edge)

Actions

[x] Split topology live since 2026-05-27
[x] SSH aliases configured
[x] 28 gotchas documented in MEMORY.md reference_coolify_gotchas

openclaw-vps-architecture — the separate AI infrastructure VPS
coolify-deployment-default — why Coolify at all
gitlab-self-hosted-not-github — GitLab triggers Coolify deploys

🔗 Relationships

graph LR coolify_multi_server_topology["coolify-multi-server-topology"]:::self coolify_multi_server_topology --> openclaw_vps_architecture["openclaw-vps-architecture"] coolify_multi_server_topology --> coolify_deployment_default["coolify-deployment-default"] coolify_multi_server_topology --> gitlab_self_hosted_not_github["gitlab-self-hosted-not-github"] classDef self fill:#715EE3,color:#fff,stroke:#291F50;

Generated from the Knowledge OS markdown vault · diagrams via Mermaid · source of truth = .md