How We Improved Celery’s Azure Integration: From Redis to Blob Storage
Pieter Blomme
Machine Learning Engineer
Understanding Celery’s Core Components: Broker and Backend
To appreciate the problem (and the solution!), let’s first cover the basics:
-
Broker: This is where task messages are stored before workers pick them up for execution. Common brokers include RabbitMQ, Redis, and Azure Service Bus.
-
Backend: After tasks are executed, their results need to be stored somewhere. The backend is where Celery records these task results. Examples include Redis, SQL databases, and specialized backends.
A typical Celery app looks like this:
from celery import Celery celery_app = Celery( "tasks", broker="BROKER_CONNECTION_STRING", backend="BACKEND_CONNECTION_STRING", )
Both the broker and backend play distinct roles, but in some cases, they can overlap—like Redis, which can function as both.
This overlap simplifies configuration but isn't optimal in all cloud environments.
The Challenge: Aligning Celery with Azure's Ecosystem
When working in Azure, the preferred message broker is Azure Service Bus, a robust, highly available message queueing service. However, Azure Service Bus is a pure message broker. Unlike Redis, it cannot act as a backend because Celery’s result backend requires key-value support for storing and retrieving task results. Message brokers like Service Bus operate on a first-in, first-out (FIFO) basis, meaning they only let you fetch messages sequentially, not by a specific key.
Our Orginal Workaround
In one project, we adopted a workaround: storing task results in Azure Blob Storage. By using the task’s unique identifier (UUID) as the blob name, we could store results as blobs. While this solved the immediate problem, it came with drawbacks:
-
Code complexity: We had to handle serialization, error handling, and result storage manually, which cluttered the codebase.
-
Custom logic: Each time we needed a result backend, we essentially built it from scratch.
This approach worked but wasn't elegant or scalable.
The Fix: Using Celery’s azureblockblob Backend
It turns out there’s a better way: Celery offers an azureblockblob
backend, specifically designed for storing results in Azure Blob Storage. By leveraging this backend, you can seamlessly integrate Celery’s result storage with Azure Blob, eliminating the need for custom logic.
Using the azureblockblob
backend is straightforward. Just update your Celery configuration:
celery_app = Celery( "tasks", broker="SERVICE_BUS_CONNECTION_STRING", backend="azureblockblob://CONNECTION_STRING", )
This setup automatically handles result storage in Blob Storage, including serialization and deserialization. It also simplifies maintenance: if you later decide to switch to another backend, there’s no need to rewrite custom logic—just update the backend configuration.
The Caveat: Secure Authentication with Managed Identities
While the azureblockblob
backend solves most of the problem, there’s one caveat. The library requires a fully qualified connection string, such as:
azureblockblob://ACCOUNT_NAME:ACCOUNT_KEY@BLOB_STORAGE_URL
However, in some environments, where strict security policies are enforced—storing connection strings with sensitive credentials isn’t allowed. Instead, authentication must use Azure’s Managed Identities or DefaultAzureCredential.
Our Solution
To address this limitation, we extended the existing AzureBlockBackend
to support secure authentication methods. By customizing the backend, we enabled URIs like the following:
-
azureblockblob://DefaultAzureCredential@STORAGE_ACCOUNT_URL
-
azureblockblob://ManagedIdentityCredential@STORAGE_ACCOUNT_URL
This modification lets us use Azure’s recommended authentication practices while maintaining compatibility with the azureblockblob
backend.
Contributing Back
We contributed this enhancement back to the Celery project through this pull request which will make the feature available in Celery versions > 5.4.0. As engineers, we profit enormously from the work done by the open-source community so we’re proud to play a small part in pushing things forward.
Key Takeaways
-
Challenge: Azure Service Bus is an excellent broker but doesn’t support acting as a Celery backend, leading to complications when integrating Celery with Azure.
-
Fix: The
azureblockblob
backend simplifies result storage in Azure Blob, offering a clean and robust solution. -
Improvement: By extending the backend to support secure authentication, we aligned Celery with Azure’s security best practices and streamlined our setup.
With this solution in place, we’ve not only optimized our Celery-Azure integration but also contributed to making the open-source tooling around Celery more secure and user-friendly. If you’re navigating similar challenges, the azureblockblob
backend might just be the answer you're looking for!