Fallbacks
Specify model or provider fallbacks with your Universal endpoint to handle request failures and ensure reliability.
Cloudflare can trigger your fallback provider in response to request errors or predetermined request timeouts. The response header cf-aig-step indicates which step successfully processed the request.
By default, Cloudflare triggers your fallback if a model request returns an error.
In the following example, a request first goes to the Workers AI Inference API. If the request fails, it falls back to OpenAI.
- Sends a request to Workers AI Inference API.
- If that request fails, proceeds to OpenAI.
graph TD
A[AI Gateway] --> B[Request to Workers AI Inference API]
B -->|Success| C[Return Response]
B -->|Failure| D[Request to OpenAI API]
D --> E[Return Response]
You can add as many fallbacks as you need, just by adding another object in the array.
curl https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id} \ --header 'Content-Type: application/json' \ --data '[ { "provider": "workers-ai", "endpoint": "@cf/meta/llama-3.1-8b-instruct", "headers": { "Authorization": "Bearer {cloudflare_token}", "Content-Type": "application/json" }, "query": { "messages": [ { "role": "system", "content": "You are a friendly assistant" }, { "role": "user", "content": "What is Cloudflare?" } ] } }, { "provider": "openai", "endpoint": "chat/completions", "headers": { "Authorization": "Bearer {open_ai_token}", "Content-Type": "application/json" }, "query": { "model": "gpt-4o-mini", "stream": true, "messages": [ { "role": "user", "content": "What is Cloudflare?" } ] } }]'If set, a request timeout triggers a fallback provider based on a predetermined response time (measured in milliseconds). This feature is helpful for latency-sensitive applications because your gateway does not have to wait for a request error before moving to a fallback provider.
You can configure request timeouts by using one or more of the following properties, which are listed in order of priority:
| Priority | Property |
|---|---|
| 1 | requestTimeout (added as a universal attribute) |
| 2 | cf-aig-request-timeout (header included at the provider level) |
| 3 | cf-aig-request-timeout (header included at the request level) |
Your gateway follows this hierarchy to determine the timeout duration before implementing a fallback.
These request timeout values can interact to customize the behavior of your universal gateway.
In this example, the request will try to answer What is Cloudflare? within 1000 milliseconds using the normal @cf/meta/llama-3.1-8b-instruct model. The requestTimeout property takes precedence over the cf-aig-request-timeout for @cf/meta/llama-3.1-8b-instruct.
If that fails, then the gateway will timeout and move to the fallback @cf/meta/llama-3.1-8b-instruct-fast model. This model has 3000 milliseconds - determined by the request-level cf-aig-request-timeout value - to complete the request and provide an answer.
curl 'https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}' \ --header 'cf-aig-request-timeout: 3000' \ --header 'Content-Type: application/json' \ --data '[ { "provider": "workers-ai", "endpoint": "@cf/meta/llama-3.1-8b-instruct", "headers": { "Authorization": "Bearer {cloudflare_token}", "Content-Type": "application/json", "cf-aig-request-timeout": "2000" }, "config": { "requestTimeout": 1000 }, "query": { "messages": [ { "role": "system", "content": "You are a friendly assistant" }, { "role": "user", "content": "What is Cloduflare?" } ] } }, { "provider": "workers-ai", "endpoint": "@cf/meta/llama-3.1-8b-instruct-fast", "headers": { "Authorization": "Bearer {cloudflare_token}", "Content-Type": "application/json" },14 collapsed lines
"query": { "messages": [ { "role": "system", "content": "You are a friendly assistant" }, { "role": "user", "content": "What is Cloudflare?" } ] } }]'When using the Universal endpoint with fallbacks, the response header cf-aig-step indicates which model successfully processed the request by returning the step number. This header provides visibility into whether a fallback was triggered and which model ultimately processed the response.
cf-aig-step:0– The first (primary) model was used successfully.cf-aig-step:1– The request fell back to the second model.cf-aig-step:2– The request fell back to the third model.- Subsequent steps – Each fallback increments the step number by 1.