RAG for Text Generation Processes in Businesses


Iñaki Peeters

AI Solutions Analyst

Welcome in the second part of our series on Retrieval-Augmented Generation for text generation processes in businesses. In the previous part, we discovered three different clusters of business cases where the implementation of a RAG system can reap multiple benefits. In this part, we’ll have a closer look at the basic architecture of a RAG system.
Office worker in his cluttered office, working meticulously to connect his boxes overflowing with documents to his computer using various cords and adapters, hyperrealistic digital art – Dall-E 3

The Need for Automation

In the previous part we identified a first cluster of business cases where companies want to automate text generation processes and where the generated text mainly serves as ‘evidence-based inspiration’ for employees tasked with writing a specific piece of content. More specifically, we identified the following use cases:

  • RAG-assisted writing for content creation.

  • Support documents for employee onboarding like guidelines, code of conduct, etc.

This need for automation and recent advances in the field of generative AI inspired companies to see how they could leverage the power of generative AI to speed up text generation processes in their day-to-day operations.

All Roads Lead to RAG: the Drawbacks of Deploying a (Custom) LLM

But why should you use RAG? Aren’t LLMs sufficient on its own? Well, let’s have a closer look at LLMs. By now, we know that a possible solution for automating the text generation processes mentioned above is to leverage the power of Generative AI. For text generation, Large Language Models (LLMs) are common practice. However, these models that make our lives easier in so many ways eventually also come with some drawbacks.

  • Public training data
    First of all, public LLMs, like OpenAI’s ChatGPT or Google’s Bard have been trained on publicly available data, which has two important implications. They are not able to capture information released after their training cut-off and they did not have access to non-public or company-specific data during their training. Consequently, these models are missing out on essential and up-to-date information and context, making them a not-so-ideal solution for the automatic generation of text where company-specific information is required.

  • Lack of sources
    Besides that, public LLMs are not always able to provide the sources they used to generate the output. This lack of transparency makes it difficult to assess the accuracy and correctness of the generated text.

  • Hallucinations
    Also, public LLMs are prone to hallucinations. Hallucinations occur when the LLM produces compelling output that is perceived as high quality because of the human-like language that is produced, when in reality the output contains incorrect information.

  • Other problems
    Last but not least, public LLMs also raise some questions concerning data privacy, context awareness and lack of control of the end user.

You might also consider deploying your own custom LLM. It is perfectly possible to train your own LLM on your company-specific data. However, also here some drawbacks are worth mentioning.

  • Training data requirements
    First of all, you would need a substantial amount of high-quality training data. This might be challenging for small and/or niche companies.

  • Resource intensiveness
    Next to that, training your own LLM is computationally intensive and time consuming. It requires a lot of computing capacity, which obviously doesn’t come for free.

  • Model maintenance
    Also, if you want your custom LLM to stay up-to-date with the most recent developments in your company, regular re-training, maintenance, and model fine-tuning is necessary. Again, these processes are computationally expensive and time-consuming.

  • Lack of sources
    As was the case with public LLMs, custom LLMs are also not always able to provide the sources that were used to generate a specific piece of text.

Luckily, there is a solution which tackles a lot of these LLM drawbacks. RAG allows to link the power of Generative AI to your own specific database in a cost-effective way. This link allows to generate output that is up-to-date, correct, and also traceable. Basically, it boils down to text generation through an LLM where the model has access to internal and external data sources, allowing to incorporate recent and unseen information and provide the sources used to generate the output. Let’s have a look at the architecture of this solution.

RAG Architecture

Basically, a RAG system starts with a user who wants to write a specific piece of text, e.g. in the Microsoft Word environment. While writing, he or she inputs a query for assistance into the RAG system, which contains three components: a retrieval component, an augmentation component, and a generation component. The retrieval component accesses your own database and/or other external data sources and looks for relevant information and additional context for your question. The augmentation component augments the prompt for the generation component with the information from the retrieval component. The augmentation component’s output is fed to the generation component, which is basically an LLM, and generates the desired output, incorporating the additional context.


Three Components of a RAG System

Retrieval Component

First things first, the overall system starts with setting up a vector database that contains preprocessed information from internal data sources. These sources can be any type of information that can be found within the company, like organisational databases, specific documents, templates, domain-specific databases, etc. The data can be stored in local files, cloud storage, data warehouse, etc. After locating, the data needs to be converted to text before chunking it into smaller pieces. These chunks of text are converted into numerical vectors or embeddings using an embeddings model. Finally, the vectors are indexed and stored in the vector database. The purpose of indexing is to structure the vectors in a way that allows for efficient searching and retrieval of vectors that are semantically similar to a given query vector.

Once the vector database is all set up, we can get this party started. A business user queries a request for a piece of text through the user interface, providing the system with appropriate context concerning the request. The query is converted to a vector using the same embeddings model. Through information retrieval techniques, like semantic search, both internal and external data sources are accessed in search of relevant information. The internal data sources are accessed directly via the vector database. For external data, a query for additional information is formulated, enabling information retrieval from a variety of external data sources, like websites, external API’s, external databases, etc. This information is again embedded and incorporated ad hoc in the vector database. Once all necessary information is collected, through relevance ranking, the most relevant information is selected and transported to the augmentation component.

Retrieval Component

Recently, a lot has been written about the shortcomings of vector databases in RAG applications. These limitations include:

  • Vectors may not fully capture the semantic intent of a query or document sentences

  • Loss of nuances when condensing text paragraphs in single vectors

  • Vectors only take semantic similarity into account, any other structure or relationships between different pieces of content are lost

Knowledge graphs are often mentioned as a potential solution to overcome these drawbacks. A knowledge graph is a structured way to represent and store (textual) data. The data is organized as a network of entities and relationships between these entities. It’s essentially a graphical representation, stored in a graph database, that allows for an interconnected understanding of information. The entities can be any object or concept in a document. Each entity has a unique identifier within the graph. The relationships between entities are labeled and a direction of the relationship is given. Entities in the knowledge graph can have attributes that provide additional information about the entity. Compared to a vector database, knowledge graphs not only capture the semantic meaning of a piece of text, but also the relationships between multiple pieces of text, allowing for a better understanding of the text at hand.

In a very simple setting, a knowledge graph might look something like this. As input, we take a text that contains information about different types of animals, their habitat, diet, etc. The output is a graphical representation with entities, relationships, and attributes.

Knowledge Graph

Augmentation Component

The augmentation component receives the relevant context from the retrieval component. It also takes the original query as an input. The purpose of this component is to augment or enrich the original query with the additional information that was retrieved in the previous step through prompt engineering. This can be done in multiple ways. The easiest way is to just append the relevant context to the original query. Other possibilities are, among others, interpolation, contextual embeddings, or via attention mechanisms. Once the original query is augmented with the additional context, the whole query gets transported to the third component, the generation component.

Augmentation Component

Generation Component

The generation component, which is basically a Large Language Model (LLM), receives the augmented prompt from the augmentation component. This LLM can be a publicly available LLM, like ChatGPT or Bard, or preferably a locally deployed open source LLM such as Llama v2. The LLM processes the augmented query, acts like a language interface and generates the output for the initial query of the business user, while also providing the sources used to generate the output.


Generation Component

In the Land of Data, Quality is King: Lessons in Data Cleaning & Processing

At Faktion, we strongly believe in leveraging the power of AI to automate data quality tasks, which simultaneously improves data quality. Two of the first and most crucial tasks in data quality are data cleaning and data processing.

Data cleaning is the process of identifying and correcting errors, inconsistencies, and inaccuracies in a dataset to improve its quality and reliability. The goal of data cleaning is to ensure that the data is accurate, complete, and suitable for analysis and other purposes. It involves, among other things, identifying and handling missing values, outliers, and duplicates.

Data processing, on the other hand, is the process of manipulating, transforming, and organising data to generate meaningful information. It involves a series of operations performed on raw data to convert it into a format that can be used for analysis, reporting, and other purposes.

Data cleaning and processing are prerequisites to guarantee proper data quality for your RAG system. It ensures that users are able to retrieve complete and correct information in an efficient way. Through the use of AI, Faktion’s IDQO toolbox automates data cleaning and processing tasks needed to set the scene before deploying your RAG system, whether it’s duplicates removal, outlier handling, or embeddings generation.

That’s All, Folks! Isn’t it?

Et voilà, the output from the RAG system now contains up-to-date and relevant information, while also allowing to retrieve the sources used to generate the output. In this way, the RAG system solves some of the drawbacks of conventional LLM’s: no more hallucinations, outdated information, or non-traceable output. In the next part of this series, we’ll elaborate on the second cluster of business cases, incorporating the need for scalability and see what implications this encompasses for a RAG system.

Get in touch!

Inquiry for your POC

Scroll to Top