Coolify multi-server topology โ staging+control vs prod
Context
Aspire's customer-facing apps deploy via Coolify. As the app count grew, running everything on one box created risk: a staging deploy could starve a production app of resources, and the control plane (Coolify itself) shared a host with customer workloads. The decision splits deployment across two servers with clear roles.
Detail
Topology
| Server | IP | Role | SSH |
|---|---|---|---|
| Staging + control plane | 112.121.151.46 | Coolify control plane + staging deploys | ssh coolify-staging (key ~/.ssh/coolify_vps) |
| Production | 103.243.116.249 | Customer-facing prod apps | ssh coolify-prod (key ~/.ssh/id_ed25519, UUID x9dcph0q2mtwql9illb6umkp) |
- Control plane =
app.dssc.co.th - SSH aliases live in
~/.ssh/config(since 2026-05-27) - To land a new app on prod: pass
server_uuidfor prod-runner-1 in the Coolify create call
Why split
| Risk on single box | Mitigated by split |
|---|---|
| Staging build starves prod app | Separate hosts, separate resources |
| Control plane outage takes down apps | Control plane on staging box; prod apps keep serving |
| Resource contention during concurrent deploys | Builds happen on staging-side; prod just runs |
Companion: OpenClaw VPS
Note this is SEPARATE from the OpenClaw VPS (112.121.151.239) which hosts the AI agent fleet + Knowledge OS โ see openclaw-vps-architecture. Three distinct servers total:
112.121.151.46โ Coolify staging + control103.243.116.249โ Coolify production112.121.151.239โ OpenClaw AI infrastructure
Coolify deploy gotchas (hard-won, per MEMORY.md reference_coolify_gotchas)
28 documented quirks. The ones that bite most often:
| # | Quirk | Workaround |
|---|---|---|
| 1 | ports_exposes API param silently rejected (defaults to 80) | Encode port in FQDN as https://host:4000 |
| 2 | update_application doesn't regenerate Traefik labels | Delete + recreate to change port routing |
| 3 | limits_memory bytes bug | Watch the unit |
| โ | dockercompose default path .yaml not .yml | Name files .yaml |
| 21-23 | helper-prep race / container-recycle race / helper-killed-mid-build | "Wait until in_progress queue EMPTY, then retry on idle VPS" |
| 24 | GitLab PAT in ~/.claude.json rotates mid-session | Re-read on 401 |
| 25 | Coolify auto-cancels queued >2h | Retry |
| 26 | force=true amplifies contention | Avoid during busy periods |
| 28 | SvelteKit-proxies-Go-API needs explicit /api/* proxy routes | Add them |
Build-time env injection gotcha (per MEMORY.md feedback_coolify_env_add_race)
- Adding envs while a deploy is queued triggers duplicate dispatch โ docker-stops the helper mid-build
- Inter-stage
COPY node_modulesOOMs on busy VPS โ collapse to single stage NODE_ENV=productionas buildtime env breaksnpm ci(skips devDeps) โ forceNODE_ENV=developmentin deps stage
Build log access (per MEMORY.md reference_coolify_build_logs)
GET /api/v1/deployments/applications/{uuid}?take=Nreturns logs as JSON-string--no-cachemeans every deploy is a full rebuild- Deployments list JSON truncates ~2KB
Image-size killer (per MEMORY.md feedback_docker_image_size)
- Single-stage with full devDeps = export >300s = "context deadline exceeded"
- Use multi-stage +
npm prune(but watch the 2-stagepnpm prune --prodOOM โ needs 3-stage) - Drizzle: Coolify CMD must use
node scripts/migrate.mjs, NEVERdrizzle-kit migrateat runtime (prod runner strips devDeps)
Constraints we accepted
- Two Coolify servers to patch/monitor instead of one
- Cross-server deploys need explicit
server_uuidtargeting
Revisit trigger
- App count exceeds what 2 servers handle comfortably โ add a third runner
- Need for geo-distributed deploys (e.g., AU + Asia edge)
Actions
- [x] Split topology live since 2026-05-27
- [x] SSH aliases configured
- [x] 28 gotchas documented in MEMORY.md
reference_coolify_gotchas
Related
- openclaw-vps-architecture โ the separate AI infrastructure VPS
- coolify-deployment-default โ why Coolify at all
- gitlab-self-hosted-not-github โ GitLab triggers Coolify deploys
๐ Relationships
graph LR
coolify_multi_server_topology["coolify-multi-server-topology"]:::self
coolify_multi_server_topology --> openclaw_vps_architecture["openclaw-vps-architecture"]
coolify_multi_server_topology --> coolify_deployment_default["coolify-deployment-default"]
coolify_multi_server_topology --> gitlab_self_hosted_not_github["gitlab-self-hosted-not-github"]
classDef self fill:#715EE3,color:#fff,stroke:#291F50;