The Need for Scalability
In the first part of this series, we identified some business use cases where on top of a basic RAG system, a scalability requirement was needed. We identified the following use cases:
E-commerce businesses leveraging RAG to generate (personalised) product descriptions
Tech companies using RAG-assisted writing for handling FAQ and customer complaints
The need for scalability clearly indicates a desired movement towards cloud-based RAG solutions. Luckily, a lot of the work has already been done. Popular cloud services and other parties offer pre-built components and tools that allow to facilitate and streamline the deployment of a RAG system, which makes RAG accessible for businesses with limited AI capabilities. We’ve summarised some of the most important tools you can use to easily operate RAG in (the cloud environment of) your business.
The Cloud is the Limit: What Tools Do Cloud Service Providers Offer?
Following the rise of generative AI and LLM’s, a lot has been written about how to properly deploy LLM-based applications in a business environment. Similar to MLOps, LLMOps (short for Large Language Models Operations) is a hot topic nowadays. LLMOps is a set of best practices and principles that aim to integrate the development, deployment, and management of LLM-based applications.
It may not come as a surprise, but the prominent players in the field of cloud services all deliver the necessary infrastructure and tools in their respective machine learning platforms to facilitate building and deploying LLM-based applications, such as RAG systems. Whether you work with Microsoft Azure ML, Amazon Sagemaker or Google’s Vertex AI, it is possible to build, train, deploy and manage RAG systems for enhanced text generation. Next to these more high-level tools, each of the big three cloud providers also offers additional tooling that can be used to make your life easier when it comes to building your end-to-end RAG pipeline.
In part 2, we discussed the basic architecture of a RAG system as consisting of three main components: the retrieval component, the augmentation component, and the generation component. Microsoft Azure, Amazon Web Services (AWS), and Google Cloud Platform (GCP) all offer opportunities to streamline the processes in each of these components. The table below provides an overview of well-known tools, within the environment of the Big Three cloud providers, that make our lives easier when it comes to deploying a RAG system. Note that new tools are added almost on a daily basis, and the tools listed below are certainly not the only possibilities for the tasks mentioned, but they do provide a good starting point before diving deeper to find a tool that is perfectly suited to your specific needs. Let’s have a look at each of them.
As explained before, setting up a RAG system starts with establishing a vector database that contains all internal information needed as input for the text generation process. In order to obtain this vector representation of textual data, an embeddings model is needed to convert text to vectors. Microsoft offers OpenAI’s Ada embeddings models to convert text into vector representations.
Once the vector representation is obtained, the embeddings need to be stored in a vector database before the retrieval component can do its work. Azure’s Cognitive Search is a search service that provides infrastructure, APIs, and tools for building a rich search experience over private, heterogeneous content.
For both the augmentation component and the generation component, Microsoft Azure provides architecture for prompt engineering and for training and deploying LLM’s in its machine learning environment Azure ML. Besides that, Microsoft also offers Prompt Flow, a development tool that can be used to streamline the entire development cycle of AI applications powered by LLMs, embedded in a visual interface.
Amazon Web Services
Also AWS offers an in-house embeddings model called Titan Embeddings, which is a part of Amazon’s foundation model Titan.
Similarly to Azure’s Cognitive Search, AWS offers OpenSearch Service, a search and analytics suite that can be used for a broad set of use cases, such as RAG. OpenSearch Service, previously know as Amazon Elasticsearch Service, is a managed service, providing a way for users to deploy Elasticsearch (a search and analytics engine developed by Elastic) at scale.
Within its ML platform, Amazon Sagemaker, the necessary tooling for developing AI models is provided. Besides that, AWS also offers Amazon Bedrock, a service that allows its users to choose from several foundation models by leading AI companies and provides a wide range of capabilities to build generative AI applications.
Google Cloud Platform
GCP offers multiple in-house text embeddings model, of which the two most well-known are Word2Vec and BERT. While the former is a shallow, neural network that produces static embeddings, the latter is a deep neural network, providing dynamic, context-aware embeddings. Note that the features of these two embeddings models are certainly not limited to generating embeddings.
GCPs Cloud Search basically offers the same functionalities as Azure’s and AWS' alternatives: it allows employees of a company to search and retrieve information from the company’s internal data repositories.
Similar to Azure and AWS, in Vertex AI, GCPs native ML platform, necessary tooling to streamline the development of AI applications is present. However, on top of the platform, GCPs Vertex AI Extensions provides a set of fully-managed developer tools for extensions, which connect models to APIs for real-time data and real-world actions. Within this set of tools, there is some tooling available that allow users to deploy generative AI applications linked to proprietary data sources.
Third-Party Tools and Frameworks for RAG System Deployment
Also worth mentioning, besides the tools that the big cloud service providers offer, there is also a broad range of third-party tools and frameworks available as well. These tools and frameworks are designed to make the deployment of a RAG system easier and more accessible, while also allowing for integrations with your cloud environment. We’ve listed the most important tools and their main applications.
LangChain is a framework for developing applications that are powered by LLM’s. LangChain is particularly interesting for deploying RAG systems as it enables applications that are context-aware (connecting the LLM to sources of context, i.e. your internal and external data sources) and also applications that rely on LLM to reason.
LlamaIndex is a data framework for LLM-based applications to ingest, structure, and access private or domain-specific data. It offers a flexible approach to connect various data types with LLM applications. LlamaIndex is especially designed for RAG applications whereas LangChain is a more general framework that offers other functionalities as well.
ElasticSearch is a distributed search and analytics engine. It can be used in combination with LLM’s to create text generation applications, using real-time, proprietary data. Microsoft Azure’s Cognitive Search and AWS’ OpenSearch Service are based on ElasticSearch. While GCP Cloud Search is leverages Google's own search and machine learning technologies, GCP does offer an ElasticSearch service on its machine learning platform
Pinecone is a vector database and search service, comparable with Azure’s Cognitive Search, Google’s Cloud Search, and AWS' OpenSearch Service. It basically has the same functionalities as the other search services, allowing for quick and relevant searching within your company’s databases.
Similarly, Weaviate is an open-source vector database that is designed for building and scaling AI applications. It allows for efficient vector embeddings storage and retrieval. Just like Pinecone, Weaviate integrates with the machine learning environments of popular cloud providers.
Faiss and NMSLIB are both programming libraries designed for efficient similarity search and clustering of dense vectors. Such libraries often serve as the basis for vector database tools as they leverage the algorithms and techniques included in these libraries.
Note that the tools listed above are just some of the tools that can be used when deploying your own RAG system. In reality, there’s plenty of other tools that do similar and other things.
The Benefits of a Cloud-Based Solution
From the above, we can conclude that significant effort has been made towards facilitating the deployment of RAG systems in an enterprise context. From the preparatory work of setting up a vector database to the whole lifecycle of a RAG system, the whole pipeline for building and deploying RAG systems can be done in the cloud.
The benefits of deploying a cloud-based solution rather than a locally deployed RAG system allows for scalability throughout the organisation. Large numbers of request and data can be processed and in times of peak demand, upscaling is readily available. Next to that, a cloud-based solution offers the possibility to integrate the RAG system with other cloud services allowing for company-wide access to the system and its associated applications.
Precautions Towards Data Security and Protection
Of course, relying on cloud services for deploying and managing AI applications obviously raises some privacy- and security-related questions. People frequently ask themselves questions like:
“Where is my data handled and stored?”
“Is my data safe in case of a breach or a hack?”
“Who can access and see my data”
Luckily for these people, cloud service providers like AWS, Microsoft, and Google don’t like to take any risks by leaking your data to that one exotic prince that promised you half of its fortune if you pay for his visa. Advanced firewalls and data encryption provide a first layer of protection. Also access control systems provide an additional protective layer. These systems make sure that only authorised individuals have access to your data. Cloud providers use various access control mechanisms including multi-factor authentication, role-based access control, etc. Cloud service providers also maintain detailed logs of system and user activities, which can be used to detect unusual activities. Overall, you can rest assured that your data is safe in the cloud.
In the Land of Data, Quality is King: Lessons in Data Matching
Data matching is the process of identifying and linking similar or identical records from different datasets. The goal is to consolidate and merge duplicate or similar entries to create a single, accurate, and comprehensive view of each entity or record. Data matching is an important step in data quality optimisation before deploying a RAG system. Suppose, you’re working for a B2B building materials supplier. There are currently five departments in your company: lumber, steel, bricks, insulation, and plumbing. As your customers are primarily contractors, they typically order from supplies from different departments in your company. Until now, all departments had their own database where they collect data about clients, orders, inventory, etc. The marketing team wants to deploy a RAG system for generating a personalised e-mail campaign, offering client-specific product recommendations and promotions. In this example, it is important that data concerning one specific from different departments is matched in a central database. This ensures efficient and relevant information retrieval and prevents information overload or confusion.
Of course, manually matching data entities to each other can be a tedious and time-consuming task. Faktion’s IDQO toolbox provides a solution to automate the manual matching process. By deploying an unsupervised clustering model, like k-means or hierarchical clustering, identical or similar entities can be found and linked to each other.
A Square Peg in a Round Hole: How to Convert Generic Tooling into a Specific Solution?
The platforms and tools that cloud service providers offer, contain pre-built components that simplify the process of deploying a RAG system. However, it’s worthwhile to mention that even with these pre-built components, NLP expertise is still required. Someone needs to understand the underlying technology and be able to design the pipelines, do the data wrangling, evaluate the output, and refine the model if needed.
Besides that, the off-the-shelf solutions that are offered are all generic solution. It’s a good starting point, but businesses often require a specific solution, tailored at their specific situation. This specific solution can’t be obtained by solely using the pre-built components. At Faktion, we believe that the only way to move forward is to make use of the existing tools and build upon them such that we convert a generic into a specific solution.
In the next part of this series, we’ll discover how we can go from a generic to a specific RAG solution, or as we like to call it: productisation of AI applications. On to part 4!