Want to make sure your AI chatbot knows what it's talking about? Read this and learn how to set up RAG in Keboola
The biggest issue with chatbot implementations powered by generative AI is the accuracy and reliability of the output. Models can give erroneous or inaccurate answers due to hallucinations or simply because they lack information specific to a given business case, as many of them don’t have access to new data outside of pretraining.
Retrieval-Augmented Generation (RAG) is a technique designed to address this limitation by integrating an external retrieval mechanism with a generative model. This allows the language model to query an external database or knowledge base to retrieve relevant information, which is then used to generate a more accurate and contextually appropriate response.
Other advantages of using a RAG chatbot include:
In this article, we are going to look at the advantages of this technique and explain how a RAG chatbot can be set up from scratch using Keboola required.
At a high level, a RAG-powered chatbot operates in the following sequence:
Keboola’s core features are aligned with RAG's requirements; it’s a self-service data operations platform that supports efficient data indexing and retrieval and has a full toolkit of features that help build a great AI chatbot:
As a data stack as a service platform, Keboola focuses on simplification and automation, closely following the principles of data mesh.
And now, to practice…
Let’s say your goal is to build a RAG chatbot that helps users quickly retrieve relevant information from past conversations or logs. With Keboola’s extensive range of data source components, it offers a flexible foundation for RAG retrieval across many data sources.
Here is the data architecture you need to build:
Go to Keboola > Flows > Create Flow
Keboola offers a broad selection of data extractors, enabling you to connect to a variety of sources including Slack, Jira, and other supported knowledge bases, support platforms, and more. For this specific example, we’ll demonstrate how to set up a Slack extractor, but you can follow similar steps for other data sources.
Slack Extractor
To extract data from Slack, follow the instructions in this documentation and complete the authorization steps. As part of the process, define your query to optimize extraction speed.
After defining your query, save your configuration and run the extractor.
For more information on other data sources, explore Keboola’s available extractors here.
Before you save the extracted data to your vector database, you first need to transform it from text to embeddings. In Keboola, you can do this with our embeddings component that transforms and prepares your data for semantic accurate searches.
The Embeddings Component offers several enhancements for improved efficiency, scalability, and support for various vector databases and embedding providers.
These improvements make it easier to integrate and scale retrieval-augmented generation (RAG) workflows in Keboola for faster and more reliable data indexing for chatbot applications.
While RAG is one of the most well-known applications of vector embeddings, they have a wide range of practical use cases:
Content Clustering & Topic Modeling: You can use embeddings to group similar articles, research papers, or customer reviews, making it easier to organize large datasets.
Add your configuration settings for OpenAI’s API. (While this example will use OpenAI, the same approach can be applied to our other available embedding providers.)
Choose OpenAI’s "text-embedding-3-small" model to generate vector representations of text. This model balances efficiency and performance, making it well-suited for applications requiring fast and scalable embedding generation.
We will generate embeddings of the “Text” column, use “ts” as our unique ID column, and include a couple of the other columns for metadata. Metadata can provide additional context for filtering and retrieval for applications
Using Metadata for hybrid search and filtering
Metadata allows for more refined searches by combining vector similarity with other queries. When doing retrieval–for example– you can filter results based on categories, timestamps, or other attributes before ranking based on semantic similarity.
Once the component completes, you should see a populated index in your Keboola storage tables.
Keboola already has a pre-built, customizable chatbot Data App (available here). You just need to connect it to your data.
Go to Components > Data App > Add Data App > Create New Data App. Next, add the GitHub chatbot link to the Project URL field.
Set up environment secrets to align with the ones used earlier.
Save your configuration and click “Deploy Data App” in the top-right corner. Once this finishes processing click “Open Data App” and enjoy your new RAG Chatbot hosted on Keboola!
To store these embeddings in Pinecone instead, we only have to change a few settings.
Create a new Pinecone index
Use the corresponding embedding dimensions for the previously selected model. In this case, "text-embedding-3-small" has a dimension of 1536. Ensure the index configuration matches this dimension to store and retrieve embeddings correctly.
Re-configure the output settings for the vector database
Set the Output Configuration to Vector Database, then select Pinecone as the storage provider. Update the API Key, Index Name, and Environment to match the variables found in the Pinecone Dashboard.
Set up environment secrets to align with the new pinecone ones used earlier.
You can now store your vector embeddings on external platforms, making it easier to integrate with your existing data pipelines and infrastructure. Our RAG Demo app comes pre-configured for both Pinecone and Keboola Storage, allowing you to generate and test RAG queries across both platforms right away.
For longer text consider increasing the batch size to improve processing efficiency. Additionally, adjust the chunking strategy by splitting text into semantically meaningful segments like paragraphs or sentences, which can help preserve context and improve embedding quality. This may take some adjustment so experiment with different chunk sizes and overlap lengths to balance information retention and model performance!
For our Slack application, set the Batch Size to 100, enable Chunking, and change the Chunking Strategy to words. These settings help optimize processing efficiency and ensure that text is embedded in manageable segments.
Every business case is different. If you want to see how Keboola can help you solve a particular problem you’re facing, if you have questions after reading this guide, or if you simply need a second opinion regarding your project — reach out to us.
We’ll be happy to sit down and talk to you. There is nothing we love more than a chat about data.
References:
[1] Magesh,Surani et al., Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools, Stanford University, Preprint, 2024