
Fredy Acuna / December 8, 2025 / 7 min read
Cloudflare Workers AI lets you run AI models serverlessly without managing infrastructure. The best part? You get 10,000 free Neurons daily—enough for hundreds of conversations per day.
In this guide, we'll build a simple chatbot using the REST API that you can call from any application.
| Feature | Benefit |
|---|---|
| Free Tier | 10,000 Neurons/day (~hundreds of conversations) |
| No GPU Management | Serverless inference on Cloudflare's edge |
| 50+ Models | Llama 4, Mistral, Gemma, DeepSeek, and more |
| Low Latency | Runs on Cloudflare's global network |
| Simple API | REST API works from anywhere |
Save these values:
CLOUDFLARE_ACCOUNT_ID=your-account-id
CLOUDFLARE_API_TOKEN=your-api-token
The API endpoint format is:
https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/run/{MODEL}
Let's test with Llama 3.1 8B using curl:
curl https://api.cloudflare.com/client/v4/accounts/YOUR_ACCOUNT_ID/ai/run/@cf/meta/llama-3.1-8b-instruct \
-H "Authorization: Bearer YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"}
]
}'
Response:
{
"result": {
"response": "The capital of France is Paris."
},
"success": true
}
Here are some popular models you can use:
| Model | ID | Best For |
|---|---|---|
| Llama 3.1 8B | @cf/meta/llama-3.1-8b-instruct | General chat, fast responses |
| Llama 3.1 70B | @cf/meta/llama-3.1-70b-instruct | Complex tasks, better quality |
| Llama 3.3 70B | @cf/meta/llama-3.3-70b-instruct | Latest Llama, optimized |
| Gemma 3 12B | @cf/google/gemma-3-12b-it | Multilingual (140+ languages) |
| Mistral Small | @cf/mistralai/mistral-small-3.1-24b-instruct | Vision + tool calling |
| DeepSeek R1 | @cf/deepseek-ai/deepseek-r1-distill-qwen-32b | Reasoning tasks |
| QwQ 32B | @cf/qwen/qwq-32b | Competitive reasoning |
Tip: Start with
llama-3.1-8b-instructfor testing—it's fast and uses fewer Neurons.
Here's a complete JavaScript example that maintains conversation history:
const ACCOUNT_ID = 'your-account-id';
const API_TOKEN = 'your-api-token';
const MODEL = '@cf/meta/llama-3.1-8b-instruct';
const conversationHistory = [
{ role: 'system', content: 'You are a helpful assistant. Be concise.' }
];
async function chat(userMessage) {
// Add user message to history
conversationHistory.push({ role: 'user', content: userMessage });
const response = await fetch(
`https://api.cloudflare.com/client/v4/accounts/${ACCOUNT_ID}/ai/run/${MODEL}`,
{
method: 'POST',
headers: {
'Authorization': `Bearer ${API_TOKEN}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({ messages: conversationHistory })
}
);
const data = await response.json();
if (data.success) {
const assistantMessage = data.result.response;
// Add assistant response to history
conversationHistory.push({ role: 'assistant', content: assistantMessage });
return assistantMessage;
} else {
throw new Error(data.errors?.[0]?.message || 'API Error');
}
}
// Usage
const answer = await chat('What is machine learning?');
console.log(answer);
const followUp = await chat('Can you give me an example?');
console.log(followUp);
If you want to expose this as an API, here's a simple Express.js server:
import express from 'express';
const app = express();
app.use(express.json());
const ACCOUNT_ID = process.env.CLOUDFLARE_ACCOUNT_ID;
const API_TOKEN = process.env.CLOUDFLARE_API_TOKEN;
const MODEL = '@cf/meta/llama-3.1-8b-instruct';
app.post('/api/chat', async (req, res) => {
const { messages } = req.body;
if (!messages || !Array.isArray(messages)) {
return res.status(400).json({ error: 'Messages array required' });
}
try {
const response = await fetch(
`https://api.cloudflare.com/client/v4/accounts/${ACCOUNT_ID}/ai/run/${MODEL}`,
{
method: 'POST',
headers: {
'Authorization': `Bearer ${API_TOKEN}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({ messages })
}
);
const data = await response.json();
if (data.success) {
res.json({ response: data.result.response });
} else {
res.status(500).json({ error: data.errors?.[0]?.message });
}
} catch (error) {
res.status(500).json({ error: 'Failed to get response' });
}
});
app.listen(3000, () => console.log('Server running on port 3000'));
Call it from your frontend:
const response = await fetch('/api/chat', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
messages: [
{ role: 'system', content: 'You are a helpful assistant.' },
{ role: 'user', content: 'Hello!' }
]
})
});
const data = await response.json();
console.log(data.response);
For a ChatGPT-like typing effect, enable streaming:
const response = await fetch(
`https://api.cloudflare.com/client/v4/accounts/${ACCOUNT_ID}/ai/run/${MODEL}`,
{
method: 'POST',
headers: {
'Authorization': `Bearer ${API_TOKEN}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({
messages: [{ role: 'user', content: 'Tell me a story' }],
stream: true
})
}
);
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
// Parse SSE format: data: {"response": "..."}
const lines = chunk.split('\n');
for (const line of lines) {
if (line.startsWith('data: ')) {
const json = JSON.parse(line.slice(6));
process.stdout.write(json.response);
}
}
}
| Parameter | Type | Description |
|---|---|---|
messages | array | Array of message objects with role and content |
stream | boolean | Enable streaming responses (default: false) |
max_tokens | number | Maximum tokens in response |
temperature | number | Creativity (0-2, default: 0.6) |
| Role | Description |
|---|---|
system | Sets the AI's behavior and personality |
user | The human's message |
assistant | Previous AI responses (for context) |
Cloudflare uses "Neurons" as the billing unit:
| Tier | Neurons | Cost |
|---|---|---|
| Free | 10,000/day | $0 |
| Paid | Per 1,000 Neurons | $0.011 |
Example: A typical conversation (prompt + response) uses ~100-500 Neurons depending on length. The free tier supports 20-100 conversations/day.
data.success before using the response// Example: Limit conversation history
if (conversationHistory.length > 20) {
// Keep system prompt + last 19 messages
conversationHistory = [
conversationHistory[0],
...conversationHistory.slice(-19)
];
}
messages is an arraymax_tokens if you don't need long responsesYou now have a working AI chatbot powered by Cloudflare Workers AI with:
This is a cost-effective alternative to OpenAI or other paid APIs for personal projects and prototypes.