Fredy Acuna
  • Posts
  • Projects
  • Contact
LinkedInXGitHubMedium

© 2025 Fredhii. All rights reserved.

Back to posts
How to Build a Free Chatbot with Cloudflare Workers AI

How to Build a Free Chatbot with Cloudflare Workers AI

Fredy Acuna / December 8, 2025 / 7 min read

Build a Free AI Chatbot with Cloudflare Workers AI

Cloudflare Workers AI lets you run AI models serverlessly without managing infrastructure. The best part? You get 10,000 free Neurons daily—enough for hundreds of conversations per day.

In this guide, we'll build a simple chatbot using the REST API that you can call from any application.


What You'll Learn

  • Setting up Cloudflare Workers AI credentials
  • Making API calls to AI models
  • Building a simple chat interface
  • Choosing the right model for your use case

Prerequisites

  • A Cloudflare account (free)
  • Basic knowledge of JavaScript/API calls
  • A way to make HTTP requests (curl, Postman, or code)

Why Cloudflare Workers AI?

FeatureBenefit
Free Tier10,000 Neurons/day (~hundreds of conversations)
No GPU ManagementServerless inference on Cloudflare's edge
50+ ModelsLlama 4, Mistral, Gemma, DeepSeek, and more
Low LatencyRuns on Cloudflare's global network
Simple APIREST API works from anywhere

Step 1: Get Your API Credentials

  1. Log in to Cloudflare Dashboard
  2. Go to AI → Workers AI in the sidebar
  3. Click Use REST API
  4. Click Create a Workers AI API Token and copy it
  5. Copy your Account ID from the dashboard URL or overview page

Save these values:

CLOUDFLARE_ACCOUNT_ID=your-account-id
CLOUDFLARE_API_TOKEN=your-api-token

Step 2: Make Your First API Call

The API endpoint format is:

https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/run/{MODEL}

Let's test with Llama 3.1 8B using curl:

curl https://api.cloudflare.com/client/v4/accounts/YOUR_ACCOUNT_ID/ai/run/@cf/meta/llama-3.1-8b-instruct \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is the capital of France?"}
    ]
  }'

Response:

{
  "result": {
    "response": "The capital of France is Paris."
  },
  "success": true
}

Step 3: Available Models

Here are some popular models you can use:

Text Generation (Chat)

ModelIDBest For
Llama 3.1 8B@cf/meta/llama-3.1-8b-instructGeneral chat, fast responses
Llama 3.1 70B@cf/meta/llama-3.1-70b-instructComplex tasks, better quality
Llama 3.3 70B@cf/meta/llama-3.3-70b-instructLatest Llama, optimized
Gemma 3 12B@cf/google/gemma-3-12b-itMultilingual (140+ languages)
Mistral Small@cf/mistralai/mistral-small-3.1-24b-instructVision + tool calling
DeepSeek R1@cf/deepseek-ai/deepseek-r1-distill-qwen-32bReasoning tasks
QwQ 32B@cf/qwen/qwq-32bCompetitive reasoning

Tip: Start with llama-3.1-8b-instruct for testing—it's fast and uses fewer Neurons.


Step 4: Build a Simple Chatbot

Here's a complete JavaScript example that maintains conversation history:

const ACCOUNT_ID = 'your-account-id';
const API_TOKEN = 'your-api-token';
const MODEL = '@cf/meta/llama-3.1-8b-instruct';

const conversationHistory = [
  { role: 'system', content: 'You are a helpful assistant. Be concise.' }
];

async function chat(userMessage) {
  // Add user message to history
  conversationHistory.push({ role: 'user', content: userMessage });

  const response = await fetch(
    `https://api.cloudflare.com/client/v4/accounts/${ACCOUNT_ID}/ai/run/${MODEL}`,
    {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${API_TOKEN}`,
        'Content-Type': 'application/json'
      },
      body: JSON.stringify({ messages: conversationHistory })
    }
  );

  const data = await response.json();

  if (data.success) {
    const assistantMessage = data.result.response;
    // Add assistant response to history
    conversationHistory.push({ role: 'assistant', content: assistantMessage });
    return assistantMessage;
  } else {
    throw new Error(data.errors?.[0]?.message || 'API Error');
  }
}

// Usage
const answer = await chat('What is machine learning?');
console.log(answer);

const followUp = await chat('Can you give me an example?');
console.log(followUp);

Step 5: Create an API Endpoint

If you want to expose this as an API, here's a simple Express.js server:

import express from 'express';

const app = express();
app.use(express.json());

const ACCOUNT_ID = process.env.CLOUDFLARE_ACCOUNT_ID;
const API_TOKEN = process.env.CLOUDFLARE_API_TOKEN;
const MODEL = '@cf/meta/llama-3.1-8b-instruct';

app.post('/api/chat', async (req, res) => {
  const { messages } = req.body;

  if (!messages || !Array.isArray(messages)) {
    return res.status(400).json({ error: 'Messages array required' });
  }

  try {
    const response = await fetch(
      `https://api.cloudflare.com/client/v4/accounts/${ACCOUNT_ID}/ai/run/${MODEL}`,
      {
        method: 'POST',
        headers: {
          'Authorization': `Bearer ${API_TOKEN}`,
          'Content-Type': 'application/json'
        },
        body: JSON.stringify({ messages })
      }
    );

    const data = await response.json();

    if (data.success) {
      res.json({ response: data.result.response });
    } else {
      res.status(500).json({ error: data.errors?.[0]?.message });
    }
  } catch (error) {
    res.status(500).json({ error: 'Failed to get response' });
  }
});

app.listen(3000, () => console.log('Server running on port 3000'));

Call it from your frontend:

const response = await fetch('/api/chat', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    messages: [
      { role: 'system', content: 'You are a helpful assistant.' },
      { role: 'user', content: 'Hello!' }
    ]
  })
});

const data = await response.json();
console.log(data.response);

Step 6: Streaming Responses (Optional)

For a ChatGPT-like typing effect, enable streaming:

const response = await fetch(
  `https://api.cloudflare.com/client/v4/accounts/${ACCOUNT_ID}/ai/run/${MODEL}`,
  {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${API_TOKEN}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      messages: [{ role: 'user', content: 'Tell me a story' }],
      stream: true
    })
  }
);

const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;

  const chunk = decoder.decode(value);
  // Parse SSE format: data: {"response": "..."}
  const lines = chunk.split('\n');
  for (const line of lines) {
    if (line.startsWith('data: ')) {
      const json = JSON.parse(line.slice(6));
      process.stdout.write(json.response);
    }
  }
}

API Parameters

ParameterTypeDescription
messagesarrayArray of message objects with role and content
streambooleanEnable streaming responses (default: false)
max_tokensnumberMaximum tokens in response
temperaturenumberCreativity (0-2, default: 0.6)

Message Roles

RoleDescription
systemSets the AI's behavior and personality
userThe human's message
assistantPrevious AI responses (for context)

Cost Estimation

Cloudflare uses "Neurons" as the billing unit:

TierNeuronsCost
Free10,000/day$0
PaidPer 1,000 Neurons$0.011

Example: A typical conversation (prompt + response) uses ~100-500 Neurons depending on length. The free tier supports 20-100 conversations/day.


Best Practices

  1. Use system prompts: Define the AI's personality and constraints
  2. Limit history: Keep only the last 10-20 messages to save Neurons
  3. Handle errors: Always check data.success before using the response
  4. Set max_tokens: Prevent runaway responses by limiting output length
  5. Cache responses: For repeated queries, consider caching
// Example: Limit conversation history
if (conversationHistory.length > 20) {
  // Keep system prompt + last 19 messages
  conversationHistory = [
    conversationHistory[0],
    ...conversationHistory.slice(-19)
  ];
}

Troubleshooting

401 Unauthorized

  • Check your API token is correct
  • Ensure the token has Workers AI permissions

400 Bad Request

  • Verify your JSON structure
  • Check that messages is an array

Rate Limited

  • You've exceeded the free tier
  • Wait until the next day or upgrade to paid

Slow Responses

  • Try a smaller model (8B instead of 70B)
  • Reduce max_tokens if you don't need long responses

Conclusion

You now have a working AI chatbot powered by Cloudflare Workers AI with:

  • Free daily usage (10,000 Neurons)
  • Access to 50+ models including Llama, Gemma, and Mistral
  • Simple REST API that works from any language
  • Streaming support for real-time responses

This is a cost-effective alternative to OpenAI or other paid APIs for personal projects and prototypes.


Related Resources

  • Cloudflare Workers AI Documentation
  • Workers AI Models Catalog
  • Cloudflare AI Gateway
  • Workers AI REST API Guide

Subscribe to my newsletter

Get updates on my work and projects.

We care about your data. Read our privacy policy.