Disponible en francais
Run AI models from Workers using the right provider for the task. Cloudflare-native inference is the default; use TanStack AI for application-level chat, streaming, tools, and agent state, then route to external providers when you need specific models, fallbacks, or centralized observability.
Provider Use when Workers AI (@cloudflare/tanstack-ai)Default for Cloudflare edge inference. No latency from routing outside CF network. Cloudflare AI Gateway (@cloudflare/tanstack-ai)You need caching, retries, fallback between providers, or unified observability. Replicate (TanStack AI adapter or via AI Gateway)Image generation models not available on Workers AI.
Serverless GPU inference on Cloudflare’s network. Available directly from Workers with an ai binding.
pnpm add @tanstack/ai @cloudflare/tanstack-ai
"ai" : { "binding" : " AI " }
import { chat, streamToText } from " @tanstack/ai " ;
import { createWorkersAiChat } from " @cloudflare/tanstack-ai " ;
async fetch ( _ : Request , env : Env ) {
adapter: createWorkersAiChat ( " @cf/meta/llama-3.1-8b-instruct " , { binding: env . AI } ) ,
messages: [{ role: " user " , content: " Explain edge computing in one sentence. " }] ,
return new Response ( await streamToText (stream));
import { chat, streamToText } from " @tanstack/ai " ;
import { createWorkersAiChat } from " @cloudflare/tanstack-ai " ;
const RecipeSchema = z . object ( {
ingredients: z . array (z . string ()) ,
adapter: createWorkersAiChat ( " @cf/meta/llama-3.1-8b-instruct " , { binding: env . AI } ) ,
messages: [{ role: " user " , content: " Generate a lasagna recipe as JSON. " }] ,
return Response . json (RecipeSchema . parse ( JSON . parse ( await streamToText (stream))));
import { chat, toServerSentEventsResponse } from " @tanstack/ai " ;
import { createWorkersAiChat } from " @cloudflare/tanstack-ai " ;
adapter: createWorkersAiChat ( " @cf/meta/llama-3.1-8b-instruct " , { binding: env . AI } ) ,
messages: [{ role: " user " , content: " Write a haiku about cloud computing. " }] ,
return toServerSentEventsResponse (stream);
Route requests to multiple providers through a single gateway. Get caching, retries, rate limiting, spend controls, and fallback without changing call sites. For production, prefer AI Gateway BYOK / stored provider keys so Workers and AI agents reference approved keys without reading plaintext values.
pnpm add @tanstack/ai @cloudflare/tanstack-ai
Recommended ownership split:
Security/admins create and rotate stored provider keys.
Developers reference gateway routes or stored-key names in code.
Agents can edit routing/config code, but should not receive raw provider keys.
Gateway budgets and rate limits are mandatory for autonomous agent loops.
import { createAnthropicChat, createOpenAiChat } from " @cloudflare/tanstack-ai " ;
const claude = createAnthropicChat ( " claude-haiku-4-5 " , {
binding: env . AI . gateway ( " my-gateway " ) ,
// Prefer a stored provider key / gateway route in production.
// Use env keys only for dev or providers that still require direct signing.
apiKey: env . ANTHROPIC_API_KEY ,
const gpt = createOpenAiChat ( " gpt-4o-mini " , {
binding: env . AI . gateway ( " my-gateway " ) ,
apiKey: env . OPENAI_API_KEY ,
// Keep fallback selection in application code so behavior is explicit.
const adapters = [claude, gpt];
import { chat } from " @tanstack/ai " ;
import { createWorkersAiChat } from " @cloudflare/tanstack-ai " ;
adapter: createWorkersAiChat ( " @cf/meta/llama-3.1-8b-instruct " , {
binding: env . AI . gateway ( " my-gateway " ) ,
messages: [{ role: " user " , content: " Classify this email as urgent or not. " }] ,
Provider Models OpenAI GPT-4o, GPT-4o-mini, o1, o3 Anthropic Claude 3.5, Claude 3 DeepSeek DeepSeek Chat Google AI Gemini Grok xAI models Mistral Mistral models Perplexity Sonar Replicate Flux, Ideogram, Stable Diffusion Groq Llama, Mixtral
Best for image generation models not on Workers AI. Prefer routing Replicate through Cloudflare AI Gateway for caching, fallback, and centralized observability. If a direct TanStack AI adapter is not available for the exact image workflow, call Replicate from a narrow server-side service instead of adding a second AI toolkit just for images.
import { createOpenAiChat } from " @cloudflare/tanstack-ai " ;
// For OpenAI-compatible image providers routed through AI Gateway, keep the
// gateway configuration in one server-side adapter module.
const imageAdapter = createOpenAiChat ( " recraft-ai/recraft-v3 " , {
binding: env . AI . gateway ( " my-gateway " ) ,
apiKey: env . REPLICATE_API_TOKEN ,
const response = await fetch ( " https://api.replicate.com/v1/predictions " , {
Authorization: ` Bearer ${ env . REPLICATE_API_TOKEN } ` ,
" Content-Type " : " application/json " ,
version: " black-forest-labs/flux-fill-pro " ,
prompt: " Replace the background with a sunset over mountains " ,
Model Provider Best for @cf/meta/llama-3.1-8b-instructWorkers AI Fast, cheap, edge inference @cf/meta/llama-3.1-70b-instructWorkers AI Higher quality, still serverless kimi-k2.5Workers AI Long context (256k), tool calling, vision gpt-oss-120bWorkers AI Open-weight, high reasoning mistral-small-3.1-24b-instructWorkers AI Vision + long context (128k) qwen3-30b-a3b-fp8Workers AI Reasoning, function calling, multilingual deepseek-r1-distill-qwen-32bWorkers AI Strong reasoning benchmarks qwq-32bWorkers AI Chain-of-thought reasoning llama-4-scout-17b-16e-instructWorkers AI Multimodal MoE, 16 experts
Model Provider Best for @cf/qwen/qwen2.5-coder-32b-instructWorkers AI Code-specific, 32B params @cf/meta/llama-3.1-8b-instructWorkers AI Lightweight code assist
Model Provider Best for @cf/baai/bge-base-en-v1.5Workers AI General English embeddings @cf/baai/bge-large-en-v1.5Workers AI Higher quality embeddings @cf/google/gemma-3-embedding-300mWorkers AI Lightweight, multilingual @cf/qwen/qwen3-embedding-0.6bWorkers AI Compact embedding
Model Provider @cf/baai/bge-reranker-baseWorkers AI
Model Provider Best for @cf/black-forest-labs/flux-2-klein-9bWorkers AI Fast distilled, interactive @cf/black-forest-labs/flux-2-devWorkers AI High quality, multi-reference @cf/Flux.1/schnellWorkers AI Speed (1-4 steps) black-forest-labs/flux-1.1-pro-ultraReplicate Highest quality, high cost black-forest-labs/flux-schnellReplicate Fast local/image work recraft-ai/recraft-v3Replicate SVG generation ideogram-ai/ideogram-v2-turboReplicate Text rendering in images luma/photonReplicate Photorealistic generation stability-ai/stable-diffusion-3.5-largeReplicate Complex compositions
Model Provider Best for @cf/openai/whisper-large-v3-turboWorkers AI Speech-to-text, multilingual @cf/deepgram/nova-3Workers AI Fast ASR @cf/deepgram/aura-2-enWorkers AI Natural TTS @cf/myshell-ai/melottsWorkers AI Lightweight multilingual TTS @cf/deepgram/fluxWorkers AI Conversational speech
Model Provider Best for @cf/meta/llama-3.2-11b-vision-instructWorkers AI Image understanding @cf/google/gemma-3-12b-itWorkers AI Multimodal, 140+ languages @cf/llava-hf/llava-1.5-7b-hfWorkers AI Image-to-text (beta) kimi-k2.5Workers AI Long context, tool calling, vision
Your primary inference target.
No external routing latency.
Serverless, pay-per-request, no GPU management.
BYOK / stored provider keys that developers and agents cannot read.
Budgets and rate limits for autonomous agent loops.
Caching to reduce cost on repeated prompts.
Retry logic with exponential backoff.
Automatic fallback between models or providers.
Unified observability across all AI calls.
Multi-provider routing without changing call sites.
Models not on Workers AI (Flux Pro, Ideogram V2, Recraft V3, etc.).
Specific image generation capabilities (inpainting, multi-reference, fine-tuned styles).
High-volume image generation where Replicate’s pricing fits better.
You want caching and retries on image generation calls.
You need fallback from Replicate to Workers AI for text tasks.
You want centralized logs across all providers.