Skip to content

AI Providers

Disponible en francais

Run AI models from Workers using the right provider for the task. Cloudflare-native inference is the default; use TanStack AI for application-level chat, streaming, tools, and agent state, then route to external providers when you need specific models, fallbacks, or centralized observability.

ProviderUse when
Workers AI (@cloudflare/tanstack-ai)Default for Cloudflare edge inference. No latency from routing outside CF network.
Cloudflare AI Gateway (@cloudflare/tanstack-ai)You need caching, retries, fallback between providers, or unified observability.
Replicate (TanStack AI adapter or via AI Gateway)Image generation models not available on Workers AI.

Serverless GPU inference on Cloudflare’s network. Available directly from Workers with an ai binding.

Terminal window
pnpm add @tanstack/ai @cloudflare/tanstack-ai
wrangler.jsonc
{
"ai": { "binding": "AI" }
}
import { chat, streamToText } from "@tanstack/ai";
import { createWorkersAiChat } from "@cloudflare/tanstack-ai";
type Env = { AI: Ai };
export default {
async fetch(_: Request, env: Env) {
const stream = chat({
adapter: createWorkersAiChat("@cf/meta/llama-3.1-8b-instruct", { binding: env.AI }),
messages: [{ role: "user", content: "Explain edge computing in one sentence." }],
});
return new Response(await streamToText(stream));
},
};
import { chat, streamToText } from "@tanstack/ai";
import { createWorkersAiChat } from "@cloudflare/tanstack-ai";
import { z } from "zod";
const RecipeSchema = z.object({
recipe: z.object({
ingredients: z.array(z.string()),
description: z.string(),
}),
});
const stream = chat({
adapter: createWorkersAiChat("@cf/meta/llama-3.1-8b-instruct", { binding: env.AI }),
messages: [{ role: "user", content: "Generate a lasagna recipe as JSON." }],
});
return Response.json(RecipeSchema.parse(JSON.parse(await streamToText(stream))));
import { chat, toServerSentEventsResponse } from "@tanstack/ai";
import { createWorkersAiChat } from "@cloudflare/tanstack-ai";
const stream = chat({
adapter: createWorkersAiChat("@cf/meta/llama-3.1-8b-instruct", { binding: env.AI }),
messages: [{ role: "user", content: "Write a haiku about cloud computing." }],
});
return toServerSentEventsResponse(stream);

AI Gateway request flow

Route requests to multiple providers through a single gateway. Get caching, retries, rate limiting, spend controls, and fallback without changing call sites. For production, prefer AI Gateway BYOK / stored provider keys so Workers and AI agents reference approved keys without reading plaintext values.

Terminal window
pnpm add @tanstack/ai @cloudflare/tanstack-ai

Recommended ownership split:

  • Security/admins create and rotate stored provider keys.
  • Developers reference gateway routes or stored-key names in code.
  • Agents can edit routing/config code, but should not receive raw provider keys.
  • Gateway budgets and rate limits are mandatory for autonomous agent loops.
import { createAnthropicChat, createOpenAiChat } from "@cloudflare/tanstack-ai";
const claude = createAnthropicChat("claude-haiku-4-5", {
binding: env.AI.gateway("my-gateway"),
// Prefer a stored provider key / gateway route in production.
// Use env keys only for dev or providers that still require direct signing.
apiKey: env.ANTHROPIC_API_KEY,
});
const gpt = createOpenAiChat("gpt-4o-mini", {
binding: env.AI.gateway("my-gateway"),
apiKey: env.OPENAI_API_KEY,
});
// Keep fallback selection in application code so behavior is explicit.
const adapters = [claude, gpt];
import { chat } from "@tanstack/ai";
import { createWorkersAiChat } from "@cloudflare/tanstack-ai";
const stream = chat({
adapter: createWorkersAiChat("@cf/meta/llama-3.1-8b-instruct", {
binding: env.AI.gateway("my-gateway"),
gateway: {
cacheTtl: 3600,
skipCache: false,
},
}),
messages: [{ role: "user", content: "Classify this email as urgent or not." }],
});
ProviderModels
OpenAIGPT-4o, GPT-4o-mini, o1, o3
AnthropicClaude 3.5, Claude 3
DeepSeekDeepSeek Chat
Google AIGemini
GrokxAI models
MistralMistral models
PerplexitySonar
ReplicateFlux, Ideogram, Stable Diffusion
GroqLlama, Mixtral

Best for image generation models not on Workers AI. Prefer routing Replicate through Cloudflare AI Gateway for caching, fallback, and centralized observability. If a direct TanStack AI adapter is not available for the exact image workflow, call Replicate from a narrow server-side service instead of adding a second AI toolkit just for images.

import { createOpenAiChat } from "@cloudflare/tanstack-ai";
// For OpenAI-compatible image providers routed through AI Gateway, keep the
// gateway configuration in one server-side adapter module.
const imageAdapter = createOpenAiChat("recraft-ai/recraft-v3", {
binding: env.AI.gateway("my-gateway"),
apiKey: env.REPLICATE_API_TOKEN,
});
const response = await fetch("https://api.replicate.com/v1/predictions", {
method: "POST",
headers: {
Authorization: `Bearer ${env.REPLICATE_API_TOKEN}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
version: "black-forest-labs/flux-fill-pro",
input: {
prompt: "Replace the background with a sunset over mountains",
image: inputImageUrl,
mask: maskImageUrl,
guidance_scale: 7.5,
num_inference_steps: 30,
},
}),
});
ModelProviderBest for
@cf/meta/llama-3.1-8b-instructWorkers AIFast, cheap, edge inference
@cf/meta/llama-3.1-70b-instructWorkers AIHigher quality, still serverless
kimi-k2.5Workers AILong context (256k), tool calling, vision
gpt-oss-120bWorkers AIOpen-weight, high reasoning
mistral-small-3.1-24b-instructWorkers AIVision + long context (128k)
qwen3-30b-a3b-fp8Workers AIReasoning, function calling, multilingual
deepseek-r1-distill-qwen-32bWorkers AIStrong reasoning benchmarks
qwq-32bWorkers AIChain-of-thought reasoning
llama-4-scout-17b-16e-instructWorkers AIMultimodal MoE, 16 experts
ModelProviderBest for
@cf/qwen/qwen2.5-coder-32b-instructWorkers AICode-specific, 32B params
@cf/meta/llama-3.1-8b-instructWorkers AILightweight code assist
ModelProviderBest for
@cf/baai/bge-base-en-v1.5Workers AIGeneral English embeddings
@cf/baai/bge-large-en-v1.5Workers AIHigher quality embeddings
@cf/google/gemma-3-embedding-300mWorkers AILightweight, multilingual
@cf/qwen/qwen3-embedding-0.6bWorkers AICompact embedding
ModelProvider
@cf/baai/bge-reranker-baseWorkers AI
ModelProviderBest for
@cf/black-forest-labs/flux-2-klein-9bWorkers AIFast distilled, interactive
@cf/black-forest-labs/flux-2-devWorkers AIHigh quality, multi-reference
@cf/Flux.1/schnellWorkers AISpeed (1-4 steps)
black-forest-labs/flux-1.1-pro-ultraReplicateHighest quality, high cost
black-forest-labs/flux-schnellReplicateFast local/image work
recraft-ai/recraft-v3ReplicateSVG generation
ideogram-ai/ideogram-v2-turboReplicateText rendering in images
luma/photonReplicatePhotorealistic generation
stability-ai/stable-diffusion-3.5-largeReplicateComplex compositions
ModelProviderBest for
@cf/openai/whisper-large-v3-turboWorkers AISpeech-to-text, multilingual
@cf/deepgram/nova-3Workers AIFast ASR
@cf/deepgram/aura-2-enWorkers AINatural TTS
@cf/myshell-ai/melottsWorkers AILightweight multilingual TTS
@cf/deepgram/fluxWorkers AIConversational speech
ModelProviderBest for
@cf/meta/llama-3.2-11b-vision-instructWorkers AIImage understanding
@cf/google/gemma-3-12b-itWorkers AIMultimodal, 140+ languages
@cf/llava-hf/llava-1.5-7b-hfWorkers AIImage-to-text (beta)
kimi-k2.5Workers AILong context, tool calling, vision
  • Your primary inference target.
  • No external routing latency.
  • Serverless, pay-per-request, no GPU management.
  • BYOK / stored provider keys that developers and agents cannot read.
  • Budgets and rate limits for autonomous agent loops.
  • Caching to reduce cost on repeated prompts.
  • Retry logic with exponential backoff.
  • Automatic fallback between models or providers.
  • Unified observability across all AI calls.
  • Multi-provider routing without changing call sites.
  • Models not on Workers AI (Flux Pro, Ideogram V2, Recraft V3, etc.).
  • Specific image generation capabilities (inpainting, multi-reference, fine-tuned styles).
  • High-volume image generation where Replicate’s pricing fits better.
  • You want caching and retries on image generation calls.
  • You need fallback from Replicate to Workers AI for text tasks.
  • You want centralized logs across all providers.