Deploy LLAMA3 to Cloudflare Workers

· 4 min read
Deploy LLAMA3 to Cloudflare Workers

Did you know that you can also run LLAMA3 serverless? With Cloudflare workers, this is not a problem.

And best of all: it won't cost you anything.

What is LLAMA3?

Llama 3 is Meta's latest family of open-source large language models (LLM). It's basically the Facebook parent company's response to OpenAI's GPT and Google's Gemini—but with one key difference: it's available for almost anyone to use for research and commercial purposes.

That's a pretty big deal, and over the past year, Llama 2, the previous family of models, has become a staple of open source AI developments. Llama 3 continues that promise. Let me explain.

Llama 3 is a family of LLMs like GPT-4 and Google Gemini. It's the successor to Llama 2, Meta's previous generation of AI models. While there are some technical differences between Llama and other LLMs, you would really need to be deep into AI for them to mean a lot. All these LLMs were developed and work in essentially the exact same way; they all use the same transformer architecture and development ideas like pretraining and fine-tuning.

When you type text promptly or enter text in other ways, Llama 3 tries to predict the most plausible text to follow using its neural network, a cascading algorithm modeled after the human brain with billions of variables (also called parameters). Llama 3 is able to evoke incredibly human reactions by assigning different weights to all different parameters and playing with some randomness.

Deploy LLAMA on a worker

To deploy LLAMA on a Cloudflare Worker, you must first register with Cloudflare.

We then create our first worker.

Click on "Create application" to create a new worker

We've created a new worker that can give you a simple "Hello World" when you call its URL.

However, to get a response from LLAMA, we intend to call the Model via HTTP requests.

As soon as we click “Deploy”, we modify the code of the worker.

export default {
    async fetch(request, env) {
        const body = await request.json();
        const response = await env.AI.run("@cf/meta/llama-3-8b-instruct", body);
        return new Response(JSON.stringify(response));
    },
};

Once the code has been modified and our modifications have been rolled out using Deploy, it should now be available. .

We can easily test the whole thing with CURL:

curl --location 'https://worker-muddy-shadow-260f.fwartner.workers.dev/' \
--header 'Content-Type: application/json' \
--data '{
    "prompt": "What can you do?"
}'

The response would be something like:

{"response":"Thank you for asking! I'm a conversational AI assistant, designed to help and assist you in various ways. Here are some examples of what I can do:\n\n1. **Answer your questions**: I can process natural language queries and provide accurate and informative answers on a wide range of topics, from science and history to entertainment and culture.\n2. **Generate text**: I can create custom text for you, such as email drafts, letters, or even entire articles. I can also help with proofreading and editing.\n3. **Translate languages**: I can translate text from one language to another, including popular languages like Spanish, French, German, Chinese, and many more.\n4. **Offer suggestions**: If you're stuck or need ideas, I can provide suggestions for things like gift ideas, travel destinations, or even recipe ideas.\n5. **Play games and chat**: I can engage in conversations, play simple games like 20 Questions or Hangman, and even create stories or poetry with you.\n6. **Provide definitions**: If you come across an unfamiliar word or phrase, I can define it for you and explain its context.\n7. **Help with calculations**: I can perform mathematical calculations, convert units, and even create charts and graphs.\n8. **Assist with organization**: I can help you keep track of tasks, appointments, and reminders, and even provide tips for staying organized.\n9. **Provide information**: I can provide information on topics like weather, news, sports scores, and more.\n10. **Learn and adapt**: As you interact with me, I can learn your preferences and adapt my responses to better match your needs and interests.\n\nThese are just a few examples of what I can do. If you have a specific request or question in mind, feel free to ask me, and I'll do my best to help!"}%

We can go deeper with the following data-struct for our request:

curl --location 'https://worker-muddy-shadow-260f.fwartner.workers.dev/' \
--header 'Content-Type: application/json' \
--data '{
    "messages": [
        { "role": "system", "content": "You are a helpful assistant." },
        { "role": "user", "content": "What was the weather like on august, 30, 1990?" }
    ]
}'

This will give you a more precise response that is suited to the instructions we gave the model upfront.

That's it!

We have successfully installed our own agent on Cloudflare-Workers, completely free of charge.

Still questions?

Hit me up in the comments!