conceptshared last reviewed 2026-05-21

Aspire LLM Gateway β€” OAuth-first fallback chain

Context

Aspire's LLM Gateway (per aspire-llm-gateway) routes traffic to multiple upstream providers. Each upstream has different cost, latency, and availability characteristics. The OAuth-first fallback chain (Phase 2/3 of the gateway project, LIVE 2026-05-14) tells the gateway which upstream to try first and what to fall back to when things go wrong.

Detail

Core principle: OAuth-first, then paid API, then SaaS

Aspire's 3 OAuth subscriptions (~$486-638 AUD/mo total) absorb per-token cost. So the gateway prefers OAuth upstream over paid-API upstream β€” same model quality, $0 marginal.

Per-model-class fallback chain

Claude family (claude-haiku-4-5, claude-sonnet-4-6, claude-opus-4-7)

  1. Anthropic OAuth (Claude Max subscription) β€” $0 marginal
  2. Anthropic API direct β€” paid, only used when OAuth quota exhausted (rare)
  3. No further fallback β€” fail loudly

GPT family (gpt-4.1, gpt-5.2, gpt-5.5)

  1. codex-shim sidecar (ChatGPT Pro OAuth) β€” Phase 1b DEFERRED, currently broken
  2. OpenAI API direct β€” paid, last resort
  3. No further fallback

Note: codex-shim is currently NOT working; gpt-* aliases return Connection error or 502.

Qwen family (qwen-3.6, qwen-3.6-omni)

  1. cloud-first.ai LiteLLM passthrough β€” Aspire-owned upstream
  2. qwen-3.6-omni now falls back to claude-haiku-4-5 (see status note)

Status history for qwen-3.6-omni:

KO recommendation: keep LLM_VISION_MODEL=claude-haiku-4-5 EXPLICIT in KO's .env (per MR aspire/infrastructure/knowledge-os!5) rather than relying on the hidden qwen-3.6-omni→claude-haiku fallback — explicit is honest about what's actually serving and survives any future fallback-config change.

Gemini family (gemini-2.5-pro, etc.)

  1. Google Code Assist OAuth (free tier) β€” capacity unpredictable per agent-praew issues 2026-05-13/14
  2. AI Studio paid API β€” $0 free tier still works
  3. No further fallback

Why "no silent vendor fallback"

The gateway deliberately does NOT silently fall back to a different model family when a requested model is down. Reasons:

Instead: fail loudly. Agent or operator decides what to do (retry, switch model, accept failure).

What "fail loudly" means in code

ErrorWhat gateway returnsWhat downstream sees
Upstream API key invalid401 with model-group contextVisionError / LLMError
Upstream model down (502)502 with upstream error messageSame
Upstream rate limit (429)429Same
Gateway DB down500Same

Provisioned virtual keys (as of 2026-05-21)

ConsumerKeyCreated
postizsk-woK9Fvz_lu6yLU8CyU5vSQ2026-05-09
knowledge-ossk-Tkg… (redacted)2026-05-17

Fleet migration (2026-05-14)

10 OpenClaw agents migrated to use the Aspire Gateway as their LLM endpoint. Per-agent virtual keys NOT yet provisioned for each β€” they share via OpenClaw's LITELLM_API_KEY env. To be split out per-agent at some point.

Sentivity routing (designed but not enforced)

The original design (per AGENT_LLM_PLAN Β§7) called for "sensitivity routing" β€” different upstream choices based on data classification (PII, financial, legal). Current state: all data flows through cloud-first.ai upstream because their no-retention policy covers Aspire's use cases. Sensitivity routing is dormant; revisit if a regulated tenant signs.

Related

πŸ”— Relationships

graph LR aspire_llm_gateway_fallback_chain["aspire-llm-gateway-fallback-chain"]:::self aspire_llm_gateway_fallback_chain --> aspire_llm_gateway["aspire-llm-gateway"] aspire_llm_gateway_fallback_chain --> aspire_llm_gateway_only_egress["aspire-llm-gateway-only-egress"] aspire_llm_gateway_fallback_chain --> knowledge_os_stage_1["knowledge-os-stage-1"] classDef self fill:#715EE3,color:#fff,stroke:#291F50;