How to Build a Free Chatbot with Cloudflare Workers AI

Build a Free AI Chatbot with Cloudflare Workers AI

Cloudflare Workers AI lets you run AI models serverlessly without managing infrastructure. The best part? You get 10,000 free Neurons daily—enough for hundreds of conversations per day.

In this guide, we'll build a simple chatbot using the REST API that you can call from any application.

What You'll Learn

Setting up Cloudflare Workers AI credentials
Making API calls to AI models
Building a simple chat interface
Choosing the right model for your use case

Prerequisites

A Cloudflare account (free)
Basic knowledge of JavaScript/API calls
A way to make HTTP requests (curl, Postman, or code)

Why Cloudflare Workers AI?

Feature	Benefit
Free Tier	10,000 Neurons/day (~hundreds of conversations)
No GPU Management	Serverless inference on Cloudflare's edge
50+ Models	Llama 4, Mistral, Gemma, DeepSeek, and more
Low Latency	Runs on Cloudflare's global network
Simple API	REST API works from anywhere

Step 1: Get Your API Credentials

Log in to Cloudflare Dashboard
Go to AI → Workers AI in the sidebar
Click Use REST API
Click Create a Workers AI API Token and copy it
Copy your Account ID from the dashboard URL or overview page

Save these values:

CLOUDFLARE_ACCOUNT_ID=your-account-id
CLOUDFLARE_API_TOKEN=your-api-token

Step 2: Make Your First API Call

The API endpoint format is:

https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/run/{MODEL}

Let's test with Llama 3.1 8B using curl:

curl https://api.cloudflare.com/client/v4/accounts/YOUR_ACCOUNT_ID/ai/run/@cf/meta/llama-3.1-8b-instruct \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is the capital of France?"}
    ]
  }'

Response:

{
  "result": {
    "response": "The capital of France is Paris."
  },
  "success": true
}

Step 3: Available Models

Here are some popular models you can use:

Text Generation (Chat)

Model	ID	Best For
Llama 3.1 8B	`@cf/meta/llama-3.1-8b-instruct`	General chat, fast responses
Llama 3.1 70B	`@cf/meta/llama-3.1-70b-instruct`	Complex tasks, better quality
Llama 3.3 70B	`@cf/meta/llama-3.3-70b-instruct`	Latest Llama, optimized
Gemma 3 12B	`@cf/google/gemma-3-12b-it`	Multilingual (140+ languages)
Mistral Small	`@cf/mistralai/mistral-small-3.1-24b-instruct`	Vision + tool calling
DeepSeek R1	`@cf/deepseek-ai/deepseek-r1-distill-qwen-32b`	Reasoning tasks
QwQ 32B	`@cf/qwen/qwq-32b`	Competitive reasoning

Tip: Start with llama-3.1-8b-instruct for testing—it's fast and uses fewer Neurons.

Step 4: Build a Simple Chatbot

Here's a complete JavaScript example that maintains conversation history:

const ACCOUNT_ID = 'your-account-id';
const API_TOKEN = 'your-api-token';
const MODEL = '@cf/meta/llama-3.1-8b-instruct';

const conversationHistory = [
  { role: 'system', content: 'You are a helpful assistant. Be concise.' }
];

async function chat(userMessage) {
  // Add user message to history
  conversationHistory.push({ role: 'user', content: userMessage });

  const response = await fetch(
    `https://api.cloudflare.com/client/v4/accounts/${ACCOUNT_ID}/ai/run/${MODEL}`,
    {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${API_TOKEN}`,
        'Content-Type': 'application/json'
      },
      body: JSON.stringify({ messages: conversationHistory })
    }
  );

  const data = await response.json();

  if (data.success) {
    const assistantMessage = data.result.response;
    // Add assistant response to history
    conversationHistory.push({ role: 'assistant', content: assistantMessage });
    return assistantMessage;
  } else {
    throw new Error(data.errors?.[0]?.message || 'API Error');
  }
}

// Usage
const answer = await chat('What is machine learning?');
console.log(answer);

const followUp = await chat('Can you give me an example?');
console.log(followUp);

Step 5: Create an API Endpoint

If you want to expose this as an API, here's a simple Express.js server:

import express from 'express';

const app = express();
app.use(express.json());

const ACCOUNT_ID = process.env.CLOUDFLARE_ACCOUNT_ID;
const API_TOKEN = process.env.CLOUDFLARE_API_TOKEN;
const MODEL = '@cf/meta/llama-3.1-8b-instruct';

app.post('/api/chat', async (req, res) => {
  const { messages } = req.body;

  if (!messages || !Array.isArray(messages)) {
    return res.status(400).json({ error: 'Messages array required' });
  }

  try {
    const response = await fetch(
      `https://api.cloudflare.com/client/v4/accounts/${ACCOUNT_ID}/ai/run/${MODEL}`,
      {
        method: 'POST',
        headers: {
          'Authorization': `Bearer ${API_TOKEN}`,
          'Content-Type': 'application/json'
        },
        body: JSON.stringify({ messages })
      }
    );

    const data = await response.json();

    if (data.success) {
      res.json({ response: data.result.response });
    } else {
      res.status(500).json({ error: data.errors?.[0]?.message });
    }
  } catch (error) {
    res.status(500).json({ error: 'Failed to get response' });
  }
});

app.listen(3000, () => console.log('Server running on port 3000'));

Call it from your frontend:

const response = await fetch('/api/chat', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    messages: [
      { role: 'system', content: 'You are a helpful assistant.' },
      { role: 'user', content: 'Hello!' }
    ]
  })
});

const data = await response.json();
console.log(data.response);

Step 6: Streaming Responses (Optional)

For a ChatGPT-like typing effect, enable streaming:

const response = await fetch(
  `https://api.cloudflare.com/client/v4/accounts/${ACCOUNT_ID}/ai/run/${MODEL}`,
  {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${API_TOKEN}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      messages: [{ role: 'user', content: 'Tell me a story' }],
      stream: true
    })
  }
);

const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;

  const chunk = decoder.decode(value);
  // Parse SSE format: data: {"response": "..."}
  const lines = chunk.split('\n');
  for (const line of lines) {
    if (line.startsWith('data: ')) {
      const json = JSON.parse(line.slice(6));
      process.stdout.write(json.response);
    }
  }
}

API Parameters

Parameter	Type	Description
`messages`	array	Array of message objects with `role` and `content`
`stream`	boolean	Enable streaming responses (default: false)
`max_tokens`	number	Maximum tokens in response
`temperature`	number	Creativity (0-2, default: 0.6)

Message Roles

Role	Description
`system`	Sets the AI's behavior and personality
`user`	The human's message
`assistant`	Previous AI responses (for context)

Cost Estimation

Cloudflare uses "Neurons" as the billing unit:

Tier	Neurons	Cost
Free	10,000/day	$0
Paid	Per 1,000 Neurons	$0.011

Example: A typical conversation (prompt + response) uses ~100-500 Neurons depending on length. The free tier supports 20-100 conversations/day.

Best Practices

Use system prompts: Define the AI's personality and constraints
Limit history: Keep only the last 10-20 messages to save Neurons
Handle errors: Always check data.success before using the response
Set max_tokens: Prevent runaway responses by limiting output length
Cache responses: For repeated queries, consider caching

// Example: Limit conversation history
if (conversationHistory.length > 20) {
  // Keep system prompt + last 19 messages
  conversationHistory = [
    conversationHistory[0],
    ...conversationHistory.slice(-19)
  ];
}

Troubleshooting

401 Unauthorized

Check your API token is correct
Ensure the token has Workers AI permissions

400 Bad Request

Verify your JSON structure
Check that messages is an array

Rate Limited

You've exceeded the free tier
Wait until the next day or upgrade to paid

Slow Responses

Try a smaller model (8B instead of 70B)
Reduce max_tokens if you don't need long responses

Conclusion

You now have a working AI chatbot powered by Cloudflare Workers AI with:

Free daily usage (10,000 Neurons)
Access to 50+ models including Llama, Gemma, and Mistral
Simple REST API that works from any language
Streaming support for real-time responses

This is a cost-effective alternative to OpenAI or other paid APIs for personal projects and prototypes.

Related Resources

Feature

Benefit

Free Tier

10,000 Neurons/day (~hundreds of conversations)

No GPU Management

Serverless inference on Cloudflare's edge

50+ Models

Llama 4, Mistral, Gemma, DeepSeek, and more

Low Latency

Runs on Cloudflare's global network

Simple API

REST API works from anywhere

curl https://api.cloudflare.com/client/v4/accounts/YOUR_ACCOUNT_ID/ai/run/@cf/meta/llama-3.1-8b-instruct \ -H "Authorization: Bearer YOUR_API_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the capital of France?"} ] }'

Model

Best For

Llama 3.1 8B

@cf/meta/llama-3.1-8b-instruct

General chat, fast responses

Llama 3.1 70B

@cf/meta/llama-3.1-70b-instruct

Complex tasks, better quality

Llama 3.3 70B

@cf/meta/llama-3.3-70b-instruct

Latest Llama, optimized

Gemma 3 12B

@cf/google/gemma-3-12b-it

Multilingual (140+ languages)

Mistral Small

@cf/mistralai/mistral-small-3.1-24b-instruct

Vision + tool calling

DeepSeek R1

@cf/deepseek-ai/deepseek-r1-distill-qwen-32b

Reasoning tasks

QwQ 32B

@cf/qwen/qwq-32b

Competitive reasoning

import express from 'express'; const app = express(); app.use(express.json()); const ACCOUNT_ID = process.env.CLOUDFLARE_ACCOUNT_ID; const API_TOKEN = process.env.CLOUDFLARE_API_TOKEN; const MODEL = '@cf/meta/llama-3.1-8b-instruct'; app.post('/api/chat', async (req, res) => { const { messages } = req.body; if (!messages || !Array.isArray(messages)) { return res.status(400).json({ error: 'Messages array required' }); } try { const response = await fetch( `https://api.cloudflare.com/client/v4/accounts/${ACCOUNT_ID}/ai/run/${MODEL}`, { method: 'POST', headers: { 'Authorization': `Bearer ${API_TOKEN}`, 'Content-Type': 'application/json' }, body: JSON.stringify({ messages }) } ); const data = await response.json(); if (data.success) { res.json({ response: data.result.response }); } else { res.status(500).json({ error: data.errors?.[0]?.message }); } } catch (error) { res.status(500).json({ error: 'Failed to get response' }); } }); app.listen(3000, () => console.log('Server running on port 3000'));

const response = await fetch('/api/chat', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ messages: [ { role: 'system', content: 'You are a helpful assistant.' }, { role: 'user', content: 'Hello!' } ] }) }); const data = await response.json(); console.log(data.response);

Parameter

Type

Description

messages

array

Array of message objects with role and content

stream

boolean

Enable streaming responses (default: false)

max_tokens

number

Maximum tokens in response

temperature

number

Creativity (0-2, default: 0.6)

Role

Description

system

Sets the AI's behavior and personality

user

The human's message

assistant

Previous AI responses (for context)

Tier

Neurons

Cost

Free

10,000/day

Paid

Per 1,000 Neurons

$0.011

// Example: Limit conversation history if (conversationHistory.length > 20) { // Keep system prompt + last 19 messages conversationHistory = [ conversationHistory[0], ...conversationHistory.slice(-19) ]; }