Aspire LLM Gateway β OAuth-first fallback chain
Context
Aspire's LLM Gateway (per aspire-llm-gateway) routes traffic to multiple upstream providers. Each upstream has different cost, latency, and availability characteristics. The OAuth-first fallback chain (Phase 2/3 of the gateway project, LIVE 2026-05-14) tells the gateway which upstream to try first and what to fall back to when things go wrong.
Detail
Core principle: OAuth-first, then paid API, then SaaS
Aspire's 3 OAuth subscriptions (~$486-638 AUD/mo total) absorb per-token cost. So the gateway prefers OAuth upstream over paid-API upstream β same model quality, $0 marginal.
Per-model-class fallback chain
Claude family (claude-haiku-4-5, claude-sonnet-4-6, claude-opus-4-7)
- Anthropic OAuth (Claude Max subscription) β $0 marginal
- Anthropic API direct β paid, only used when OAuth quota exhausted (rare)
- No further fallback β fail loudly
GPT family (gpt-4.1, gpt-5.2, gpt-5.5)
- codex-shim sidecar (ChatGPT Pro OAuth) β Phase 1b DEFERRED, currently broken
- OpenAI API direct β paid, last resort
- No further fallback
Note: codex-shim is currently NOT working; gpt-* aliases return Connection error or 502.
Qwen family (qwen-3.6, qwen-3.6-omni)
- cloud-first.ai LiteLLM passthrough β Aspire-owned upstream
qwen-3.6-omninow falls back toclaude-haiku-4-5(see status note)
Status history for qwen-3.6-omni:
- 2026-05-09β19: BROKEN β hard HTTP 500
AuthenticationError: api_key must be set(cloud-first.ai's upstreamQwen/Qwen3-Omni-30B-A3B-Instructmodel entry missing itsapi_key). No fallback; vision ingest down. Fix requires cloud-first.ai Proxy Admin role (Kom is Internal User). - 2026-05-30 (re-verified):
qwen-3.6-omninow returns HTTP 200 β BUT the responsemodelfield isclaude-haiku-4-5-20251001, i.e. the gateway now falls back to Claude Haiku rather than serving Qwen-Omni. Confirmed via both a text probe and a real base64-image vision probe (1Γ1 PNG β "White"). The cloud-first.ai Qwen-Omni upstream is still not independently confirmed working β every response came from Claude Haiku, never a Qwen model. - Net: the vision OUTAGE is mitigated at the gateway layer (consumers calling
qwen-3.6-omniget working Claude-Haiku vision), but the cloud-first.ai upstream fix is still outstanding. This contradicts the "no silent vendor fallback" principle below β the fallback was added between 2026-05-19 and 2026-05-30; flag for review whether it should be explicit.
KO recommendation: keep LLM_VISION_MODEL=claude-haiku-4-5 EXPLICIT in KO's .env (per MR aspire/infrastructure/knowledge-os!5) rather than relying on the hidden qwen-3.6-omniβclaude-haiku fallback β explicit is honest about what's actually serving and survives any future fallback-config change.
Gemini family (gemini-2.5-pro, etc.)
- Google Code Assist OAuth (free tier) β capacity unpredictable per agent-praew issues 2026-05-13/14
- AI Studio paid API β $0 free tier still works
- No further fallback
Why "no silent vendor fallback"
The gateway deliberately does NOT silently fall back to a different model family when a requested model is down. Reasons:
- Cost surprises β silent fallback to paid API can spike spend
- Quality regressions β different model returns different-shaped output; downstream parsers break
- Debug confusion β "why did the agent get a different answer today?" β hard to trace if fallback is silent
Instead: fail loudly. Agent or operator decides what to do (retry, switch model, accept failure).
What "fail loudly" means in code
| Error | What gateway returns | What downstream sees |
|---|---|---|
| Upstream API key invalid | 401 with model-group context | VisionError / LLMError |
| Upstream model down (502) | 502 with upstream error message | Same |
| Upstream rate limit (429) | 429 | Same |
| Gateway DB down | 500 | Same |
Provisioned virtual keys (as of 2026-05-21)
| Consumer | Key | Created |
|---|---|---|
| postiz | sk-woK9Fvz_lu6yLU8CyU5vSQ | 2026-05-09 |
| knowledge-os | sk-Tkg⦠(redacted) | 2026-05-17 |
Fleet migration (2026-05-14)
10 OpenClaw agents migrated to use the Aspire Gateway as their LLM endpoint. Per-agent virtual keys NOT yet provisioned for each β they share via OpenClaw's LITELLM_API_KEY env. To be split out per-agent at some point.
Sentivity routing (designed but not enforced)
The original design (per AGENT_LLM_PLAN Β§7) called for "sensitivity routing" β different upstream choices based on data classification (PII, financial, legal). Current state: all data flows through cloud-first.ai upstream because their no-retention policy covers Aspire's use cases. Sensitivity routing is dormant; revisit if a regulated tenant signs.
Related
- aspire-llm-gateway β the gateway project
- aspire-llm-gateway-only-egress β the "always gateway" decision
- knowledge-os-stage-1 β first major consumer (KO worker-ocr)