Building a Private RAG Project
Last updated
Last updated
How will you go about it?
Installation: Installing Pathway and required libraries.
Data Loading: Loading documents for answer retrieval.
Embedding Model Selection: Choosing an open-source embedding model.
Local LLM Deployment: Deploying a local LLM using Ollama.
LLM Initialization: Setting up the LLM instance.
Vector Document Index Creation: Building an index for efficient document retrieval.
Retriever Setup: Defining the context retrieval strategy.
Pipeline Execution: Running the Private RAG pipeline.
Should you wish to directly check out the notebook, you can visit the link below. It combines the use of Adaptive RAG technique here.
Install Pathway into a Python 3.10+ Linux runtime with a simple pip command:
Next, install LiteLLM, a library of helpful Python wrappers for calling into our LLM:
Lastly, install Sentence-Transformers for embedding the chunked texts:
Start by testing with a static sample of knowledge data. Download the sample:
Import necessary libraries:
Load documents in which answers will be searched:
Create a table with example questions:
Use pathway.xpacks.llm.embedders
module to load open-source embedding models from the Hugging Face model library. For this showcase, use the avsolatorio/GIST-small-Embedding-v0
model:
Run the Mistral 7B Local Language Model, deployed as a service using Ollama:
Download Ollama from ollama.com/download.
In your terminal, run ollama serve
.
In another terminal, run ollama run mistral
.
You can test it with the following:
Initialize the LLM instance to call your local model:
Specify the index with documents and embedding model:
Specify how to retrieve relevant context from the vector index for a user query. Use Adaptive RAG to adaptively add more documents as needed:
Run the pipeline once and print the results table with pw.debug.compute_and_print
:
Example answers based on the questions provided:
Now you have a fully private RAG set up with Pathway and Ollama. All your data remains safe on your system. The set-up is optimized for speed, thanks to how Ollama runs the LLM and how Pathway’s adaptive retrieval mechanism reduces token consumption while preserving the accuracy of the RAG.
You can now build and deploy your RAG application in production with Pathway, including updating data in constant connection with data sources and serving the endpoints 24/7. All the code logic built so far can be used directly!