conceptshared last reviewed 2026-05-21

Aspire LLM Gateway — OAuth-first fallback chain

Context

Aspire's LLM Gateway (per aspire-llm-gateway) routes traffic to multiple upstream providers. Each upstream has different cost, latency, and availability characteristics. The OAuth-first fallback chain (Phase 2/3 of the gateway project, LIVE 2026-05-14) tells the gateway which upstream to try first and what to fall back to when things go wrong.

Detail

Core principle: OAuth-first, then paid API, then SaaS

Aspire's 3 OAuth subscriptions (~$486-638 AUD/mo total) absorb per-token cost. So the gateway prefers OAuth upstream over paid-API upstream — same model quality, $0 marginal.

Per-model-class fallback chain

Claude family (`claude-haiku-4-5`, `claude-sonnet-4-6`, `claude-opus-4-7`)

Anthropic OAuth (Claude Max subscription) — $0 marginal
Anthropic API direct — paid, only used when OAuth quota exhausted (rare)
No further fallback — fail loudly

GPT family (`gpt-4.1`, `gpt-5.2`, `gpt-5.5`)

codex-shim sidecar (ChatGPT Pro OAuth) — Phase 1b DEFERRED, currently broken
OpenAI API direct — paid, last resort
No further fallback

Note: codex-shim is currently NOT working; gpt-* aliases return Connection error or 502.

Qwen family (`qwen-3.6`, `qwen-3.6-omni`)

cloud-first.ai LiteLLM passthrough — Aspire-owned upstream
qwen-3.6-omni now falls back to claude-haiku-4-5 (see status note)

Status history for qwen-3.6-omni:

2026-05-09→19: BROKEN — hard HTTP 500 AuthenticationError: api_key must be set (cloud-first.ai's upstream Qwen/Qwen3-Omni-30B-A3B-Instruct model entry missing its api_key). No fallback; vision ingest down. Fix requires cloud-first.ai Proxy Admin role (Kom is Internal User).
2026-05-30 (re-verified): qwen-3.6-omni now returns HTTP 200 — BUT the response model field is claude-haiku-4-5-20251001, i.e. the gateway now falls back to Claude Haiku rather than serving Qwen-Omni. Confirmed via both a text probe and a real base64-image vision probe (1×1 PNG → "White"). The cloud-first.ai Qwen-Omni upstream is still not independently confirmed working — every response came from Claude Haiku, never a Qwen model.
Net: the vision OUTAGE is mitigated at the gateway layer (consumers calling qwen-3.6-omni get working Claude-Haiku vision), but the cloud-first.ai upstream fix is still outstanding. This contradicts the "no silent vendor fallback" principle below — the fallback was added between 2026-05-19 and 2026-05-30; flag for review whether it should be explicit.

KO recommendation: keep LLM_VISION_MODEL=claude-haiku-4-5 EXPLICIT in KO's .env (per MR aspire/infrastructure/knowledge-os!5) rather than relying on the hidden qwen-3.6-omni→claude-haiku fallback — explicit is honest about what's actually serving and survives any future fallback-config change.

Gemini family (`gemini-2.5-pro`, etc.)

Google Code Assist OAuth (free tier) — capacity unpredictable per agent-praew issues 2026-05-13/14
AI Studio paid API — $0 free tier still works
No further fallback

Why "no silent vendor fallback"

The gateway deliberately does NOT silently fall back to a different model family when a requested model is down. Reasons:

Cost surprises — silent fallback to paid API can spike spend
Quality regressions — different model returns different-shaped output; downstream parsers break
Debug confusion — "why did the agent get a different answer today?" — hard to trace if fallback is silent

Instead: fail loudly. Agent or operator decides what to do (retry, switch model, accept failure).

What "fail loudly" means in code

Error	What gateway returns	What downstream sees
Upstream API key invalid	401 with model-group context	`VisionError` / `LLMError`
Upstream model down (502)	502 with upstream error message	Same
Upstream rate limit (429)	429	Same
Gateway DB down	500	Same

Provisioned virtual keys (as of 2026-05-21)

Consumer	Key	Created
postiz	`sk-woK9Fvz_lu6yLU8CyU5vSQ`	2026-05-09
knowledge-os	`sk-Tkg…` (redacted)	2026-05-17

Fleet migration (2026-05-14)

10 OpenClaw agents migrated to use the Aspire Gateway as their LLM endpoint. Per-agent virtual keys NOT yet provisioned for each — they share via OpenClaw's LITELLM_API_KEY env. To be split out per-agent at some point.

Sentivity routing (designed but not enforced)

The original design (per AGENT_LLM_PLAN §7) called for "sensitivity routing" — different upstream choices based on data classification (PII, financial, legal). Current state: all data flows through cloud-first.ai upstream because their no-retention policy covers Aspire's use cases. Sensitivity routing is dormant; revisit if a regulated tenant signs.

aspire-llm-gateway — the gateway project
aspire-llm-gateway-only-egress — the "always gateway" decision
knowledge-os-stage-1 — first major consumer (KO worker-ocr)

🔗 Relationships

graph LR aspire_llm_gateway_fallback_chain["aspire-llm-gateway-fallback-chain"]:::self aspire_llm_gateway_fallback_chain --> aspire_llm_gateway["aspire-llm-gateway"] aspire_llm_gateway_fallback_chain --> aspire_llm_gateway_only_egress["aspire-llm-gateway-only-egress"] aspire_llm_gateway_fallback_chain --> knowledge_os_stage_1["knowledge-os-stage-1"] classDef self fill:#715EE3,color:#fff,stroke:#291F50;

Generated from the Knowledge OS markdown vault · diagrams via Mermaid · source of truth = .md