Coolify is the default deployment platform for customer-facing apps
Context
Aspire needs a way to deploy ~40+ web apps + APIs without each project re-inventing the deploy story. Options reviewed in early 2026: Vercel/Netlify (managed PaaS), Coolify (self-hosted PaaS), bare Docker on a VPS, Kubernetes.
The first Coolify standard let Coolify pull source and build Dockerfiles directly. That worked, but it made Coolify responsible for expensive build work, increased deploy queue contention, and made staging/prod less artifact-identical. DOCX (docx.aspiredigitalgroup.com.au, api-docx.aspiredigitalgroup.com.au) proved the better pattern: GitLab builds and stores images; Coolify only pulls and runs them.
Detail
Decision
Coolify remains the default runtime/orchestration platform for customer-facing webapps. GitLab CI is the default build and promotion platform.
The standard deployment flow is:
- GitLab validates, builds, scans, and pushes an immutable image/artifact.
- Staging deploys that exact artifact through Coolify.
- Staging smoke tests pass.
- Prod promotes the same immutable artifact through Coolify.
- Coolify handles env vars, routing, volumes, health checks, logs, deploy hooks, restarts, and rollback redeploys.
Coolify-side Dockerfile builds are now a legacy/fallback path, not the preferred default.
OpenClaw VPS (112.121.151.239) remains reserved for internal AI infrastructure (28 agents, Knowledge OS, MCP servers).
Rationale
- Self-hosted = $0 marginal per-app cost as the portfolio grows
- API-driven โ Coolify MCP enables agent-driven deploys; the entire deploy lifecycle is scriptable
- GitLab-built artifacts โ build load stays on GitLab runners instead of Coolify production/staging hosts
- Artifact promotion โ prod can run the same image that passed staging instead of rebuilding from source
- Auto-deploy via GitLab CI โ push/MR pipelines trigger Coolify deploy hooks after image push and smoke gates
- Traefik built-in for TLS + routing
- Postgres + Redis as managed services within Coolify โ no separate DB management
- No vendor lock-in โ Docker images underneath; could lift to any host
What's NOT Coolify
- Knowledge OS production stack lives on OpenClaw VPS (per
KNOWLEDGE_OS_SPEC ยง4), co-located with the 28 agents to remove network hop. Coolify remains the fallback for any KO public-facing UI. - WordPress merchant sites (Alby Place WP storefront, brand-specific landing pages) โ these may be WP-on-Coolify but follow different runtime patterns than Aspire-engineered apps. Custom themes/plugins/MU plugins, production Code Snippets, Elementor/ACF config, and deploy manifests still belong in Git when they affect production behavior.
Known Coolify quirks (operational hazards captured 2026-04 โ 2026-05)
| # | Quirk | Workaround |
|---|---|---|
| 1 | ports_exposes API param silently rejected; defaults to 80 | Encode port in FQDN: https://<host>:4000. Coolify reads this to generate Traefik loadbalancer.server.port=4000 |
| 2 | update_application doesn't regenerate Traefik labels | Once an app exists, port routing is frozen. Delete + recreate to change. |
| 3 | Build-time env injection โ bulk_update_app_envs defaults is_buildtime=true โ secrets baked into image | Force-rebuild after env changes (not just restart) |
| 4 | UI env edits sometimes don't propagate via API | First attempt may need a manual UI save; API update_app_env worked reliably |
| 5 | Deploy queue holds 4+ concurrent jobs | Expect 5-15min wait during busy times |
| 6 | Dockerfile-only builds can't COPY external files | Inline everything via heredoc: RUN cat > /file <<EOF...EOF |
| 7 | Helper-prep race / container-recycle race / helper-killed-mid-build | Recover via "wait until in_progress queue EMPTY then retry on idle VPS" |
| 8 | Volume-prefix doubling on app updates โ fresh-empty volume mounted (Postiz outage 2026-05-13โ16, 8 hours) | DO NOT run Coolify ops on Postiz until permanent volume fix applied |
| 9 | GitLab PAT in ~/.claude.json rotates mid-session | Re-read on 401 |
Full quirks list lives in MEMORY.md (reference_coolify_gotchas) โ read before any Coolify deploy.
Standard deploy flow
git push / merge request
โ
GitLab validate + build + scan
โ
GitLab pushes immutable image tag
โ
Coolify deploy hook pulls that tag
โ
Coolify swaps container (with health-check)
โ
Traefik routes traffic to new container
โ
HTTP 200 on /health
Required project records:
- Registry image and immutable tag policy.
- Staging/prod Coolify app UUIDs, domains, server UUIDs, health paths, and deploy hook variable names.
- Smoke commands and expected success criteria.
- Last known-good production image tag.
- Rollback command and DB migration considerations.
- WordPress Git boundary, when applicable.
Constraints we accepted
- Coolify remains in the runtime path, so Coolify/control-plane health still matters. Mitigation: multi-server topology, deploy inventory, monitoring, and image rollback procedures.
- GitLab runner/registry health becomes part of release operations. Mitigation: keep last known-good images pullable, document rollback tags, and avoid deleting recent image tags.
- Existing apps need migration work. Mitigation: migrate when touching projects, starting with revenue-touching apps and apps with heavy builds.
Revisit trigger
- Coolify single-VPS reliability drops below 99.5% monthly uptime โ multi-host Coolify
- A regulated tenant requires single-tenant infra โ Kubernetes-per-tenant
- The Coolify quirks list grows past 30 โ deploy reliability is degrading, not improving
- GitLab registry or runner reliability blocks releases more than twice in a quarter โ review registry mirroring or runner capacity
Actions
- [x] Coolify is the default runtime for Aspire webapps
- [x] DOCX proves GitLab-built image + Coolify deploy-hook flow
- [ ] Migrate active services to GitLab-built immutable images and Coolify
dockerimageapps - [ ] Add deploy inventory/rollback manifest to each active project
- [ ] Put WordPress custom code/config into Git where it affects production behavior
- [x] KO Stage 1 NOT on Coolify (OpenClaw VPS instead) โ by design
- [ ] Future: quarterly review of the quirks list โ items that haven't fired in 2 quarters should be archived
Related
- aspire-llm-gateway โ runs on Coolify
- aspire-hub โ runs on Coolify
- knowledge-os-stage-1 โ runs on OpenClaw VPS, NOT Coolify (exception by design)