How We Improved Celery’s Azure Integration: From Redis to Blob Storage

IMG_2571

Pieter Blomme

Machine Learning Engineer

Celery is a widely-used task queue library that enables developers to distribute workload across machines or threads efficiently. Its ability to define asynchronous tasks and execute them later makes it a go-to choice for managing distributed systems. However, when integrating Celery with Azure, certain challenges arise—especially when aligning with Azure’s preferred architecture and tooling. Here’s how we tackled one such challenge and improved our setup for the cloud.

Understanding Celery’s Core Components: Broker and Backend

To appreciate the problem (and the solution!), let’s first cover the basics:

  • Broker: This is where task messages are stored before workers pick them up for execution. Common brokers include RabbitMQ, Redis, and Azure Service Bus.

  • Backend: After tasks are executed, their results need to be stored somewhere. The backend is where Celery records these task results. Examples include Redis, SQL databases, and specialized backends.

A typical Celery app looks like this:

from celery import Celery
celery_app = Celery(
    "tasks",
    broker="BROKER_CONNECTION_STRING",
    backend="BACKEND_CONNECTION_STRING",
)

Both the broker and backend play distinct roles, but in some cases, they can overlap—like Redis, which can function as both.

FastAPI + Celery setup with Redis broker and result backend.Creator: Lucy Linder
FastAPI + Celery setup with Redis broker and result backend. Creator: Lucy Linder

This overlap simplifies configuration but isn't optimal in all cloud environments.

The Challenge: Aligning Celery with Azure's Ecosystem

When working in Azure, the preferred message broker is Azure Service Bus, a robust, highly available message queueing service. However, Azure Service Bus is a pure message broker. Unlike Redis, it cannot act as a backend because Celery’s result backend requires key-value support for storing and retrieving task results. Message brokers like Service Bus operate on a first-in, first-out (FIFO) basis, meaning they only let you fetch messages sequentially, not by a specific key.

Our Orginal Workaround

In one project, we adopted a workaround: storing task results in Azure Blob Storage. By using the task’s unique identifier (UUID) as the blob name, we could store results as blobs. While this solved the immediate problem, it came with drawbacks:

  • Code complexity: We had to handle serialization, error handling, and result storage manually, which cluttered the codebase.

  • Custom logic: Each time we needed a result backend, we essentially built it from scratch.

This approach worked but wasn't elegant or scalable.

The Fix: Using Celery’s azureblockblob Backend

It turns out there’s a better way: Celery offers an azureblockblob backend, specifically designed for storing results in Azure Blob Storage. By leveraging this backend, you can seamlessly integrate Celery’s result storage with Azure Blob, eliminating the need for custom logic.

Using the azureblockblob backend is straightforward. Just update your Celery configuration:

celery_app = Celery(
    "tasks",
    broker="SERVICE_BUS_CONNECTION_STRING",
    backend="azureblockblob://CONNECTION_STRING",
)

This setup automatically handles result storage in Blob Storage, including serialization and deserialization. It also simplifies maintenance: if you later decide to switch to another backend, there’s no need to rewrite custom logic—just update the backend configuration.

The Caveat: Secure Authentication with Managed Identities

While the azureblockblob backend solves most of the problem, there’s one caveat. The library requires a fully qualified connection string, such as:

azureblockblob://ACCOUNT_NAME:ACCOUNT_KEY@BLOB_STORAGE_URL

However, in some environments, where strict security policies are enforced—storing connection strings with sensitive credentials isn’t allowed. Instead, authentication must use Azure’s Managed Identities or DefaultAzureCredential.

Our Solution

To address this limitation, we extended the existing AzureBlockBackend to support secure authentication methods. By customizing the backend, we enabled URIs like the following:

  • azureblockblob://DefaultAzureCredential@STORAGE_ACCOUNT_URL

  • azureblockblob://ManagedIdentityCredential@STORAGE_ACCOUNT_URL

This modification lets us use Azure’s recommended authentication practices while maintaining compatibility with the azureblockblob backend.

Contributing Back

We contributed this enhancement back to the Celery project through this pull request which will make the feature available in Celery versions > 5.4.0. As engineers, we profit enormously from the work done by the open-source community so we’re proud to play a small part in pushing things forward.

Key Takeaways

  • Challenge: Azure Service Bus is an excellent broker but doesn’t support acting as a Celery backend, leading to complications when integrating Celery with Azure.

  • Fix: The azureblockblob backend simplifies result storage in Azure Blob, offering a clean and robust solution.

  • Improvement: By extending the backend to support secure authentication, we aligned Celery with Azure’s security best practices and streamlined our setup.

With this solution in place, we’ve not only optimized our Celery-Azure integration but also contributed to making the open-source tooling around Celery more secure and user-friendly. If you’re navigating similar challenges, the azureblockblob backend might just be the answer you're looking for!

Get in touch!

Inquiry for your POC

=
Scroll to Top