RAG series: Intro

Matouš Eibich
July 4, 2024

What is the buzz about? 

In the modern data-centric world, the ability to harness an organization's unique dataset is critical. Large Language Models (LLMs) are powerful tools, yet their off-the-shelf versions often lack exposure to the specific, nuanced data that give businesses their competitive edge. Retrieval Augmented Generation (RAG) emerges as a pivotal technology in this context, enabling LLMs to securely tap into proprietary data sources. By integrating RAG, companies can enrich LLMs with their internal datasets, transforming these models into customized tools that deliver relevant and precise responses, even in highly specialized internal applications. This approach not only extends the functionality of LLMs but does so while maintaining the confidentiality of the data, ensuring that sensitive information remains within the secure bounds of the organization.

How does it work? 

The RAG system is elegantly straightforward in its foundational form, consisting of a series of interconnected components that facilitate utilization of your data. The process, depicted in the accompanying image, outlines a streamlined journey from data acquisition to the generation of responses. 

source: https://www.deeplearning.ai/short-courses/langchain-chat-with-your-data/

Here's an exploration of its core components:

Document Loading

The first stage is where the system ingests your data. The input data can be quite diverse - from traditional documents such as PDFs and Word files to modern data sources like Notion, YouTube, or GitHub, RAG systems can handle an impressive variety of content.


Due to the context window limitations of LLMs, large documents are segmented into smaller parts. This ensures that the models can efficiently process and understand relevant data segments.


In the storage phase, the system converts the chunks of text into numerical vectors through embedding models. This part is essential because LLMs are mathematical models that cannot understand natural language - they need the text to be represented by numbers. These vectors are then stored in a vector database (also known as vector store or simply index), which acts as a reference point for retrieval.


When a query is entered, it is also converted into a vector. The system then searches the vector database for the most relevant text chunks, effectively matching the question to the stored data.


The final step takes the retrieved chunks and the original query and feeds them into the LLM. This is where the RAG system shines, combining the input with its learned capabilities to generate a precise and contextual response. 

While this overview presents a streamlined version of RAG, actual implementations can be much more complex, particularly in production settings.

Real-World Applications of RAG

Legal Sector Collaboration

At Predli, we are collaborating with a Swedish legal firm to enhance access to over 100,000 legal documents. Our aim is to develop a chatbot that provides accurate legal advice by leveraging the vast information contained within these documents.

Financial Analysis

Another application is in the financial domain, where we are assisting a client in generating automated stock analysis and stock news twitter bot. This service promises to deliver valuable insights into market trends and stock movements.


In summary, the true value of RAG lies in its ability to grant LLMs access to previously unseen internal datasets. This access is pivotal for organizations that need to utilize their proprietary data for enhanced decision-making. By integrating RAG, LLMs can generate responses that are not only accurate but also tailored to the specific context and knowledge base of a business.

Predli is at the forefront of implementing RAG technology for practical, real-world applications. If your organization is looking to understand how RAG can improve your data utilization, we're here to help. Contact us to explore the capabilities of RAG for your business needs.

Learn more