AI/ML
Building a RAG System with DeepSeek R1, Ollama and LangChain
Free Installation Guide - Step by Step Instructions Inside!
Overview
A step by step guide to setting up a local Retrieval Augmented Generation (RAG) system using DeepSeek R1 as the LLM, Ollama as the model server and LangChain for retrieval.
RAG (Retrieval Augmented Generation) enhances LLMs by integrating a document retrieval mechanism, allowing them to generate more accurate and context aware responses. In this guide, we will:
- Load DeepSeek R1 using Ollama.
- Process and store document embeddings.
- Retrieve relevant documents based on user queries.
- Generate responses using retrieved context.
Step 1: Install Required Dependencies
Before setting up the system, install the necessary dependencies:
pip install langchain langchain-community chromadb pypdf streamlit ollama- LangChain: Framework for retrieval-based LLM applications.
- Chromadb: Vector database for storing and searching embeddings.
- PyPDF: Used for loading and parsing PDF documents.
- Ollama: Runs the DeepSeek R1 model locally.
Installing DeepSeek R1 in Ollama
Run the following command to download DeepSeek R1 to your machine:
ollama pull deepseek-r1
Step 2: Project Structure
Below is the recommended project structure:
rag-system/│── embeddings/│ ├── __init__.py│ ├── text_splitter.py # Splits documents into smaller chunks│ ├── vector_store.py # Handles embeddings and storage│── ollama_model/│ ├── __init__.py│ ├── deepseek_r1.py # Loads DeepSeek R1 with Ollama│── app/│ ├── __init__.py│ ├── retriever.py # Retrieves relevant document chunks│ ├── rag_chain.py # Generates final response│ ├── streamlit_app.py # Web UI for interaction│── data/│ ├── sample.pdf # Example document for testing│── requirements.txt # Required dependencies│── .env # API keys (if needed)│── main.py # Main entry point Step 3: Load and Process Documents
To ensure efficient retrieval, we need to split large documents into small chunks before storing embeddings.
File: “embeddings/text_splitter.py”
from langchain.text_splitter import RecursiveCharacterTextSplitterfrom langchain.document_loaders import PyPDFLoaderdef split_text(file_path): loader = PyPDFLoader(file_path) documents = loader.load() splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50) return splitter.split_documents(documents)
This script reads a PDF file, extracts text, and splits it into chunks of 500 characters.
Step 4: Generate and Store Embeddings
Now, we need to convert the text chunks into embeddings and store them in a vector database.
File: “embeddings/vector_store.py”
from langchain.vectorstores import Chromafrom langchain.embeddings import OllamaEmbeddingsdef store_embeddings(chunks): embeddings = OllamaEmbeddings(model="deepseek-r1") vector_store = Chroma.from_documents(chunks, embeddings, persist_directory="./vector_db") vector_store.persist()
Uses ChromaDB to store text embeddings.
DeepSeek R1 is used to generate embeddings via Ollama.
Step 5: Retrieve Relevant Information
When a user asks a question, we retrieve the most relevant text chunks from the vector database.
File: “app/retriever.py”
from langchain.vectorstores import Chromadef retrieve_chunks(query): vector_store = Chroma(persist_directory="./vector_db") return vector_store.similarity_search(query, k=3)
Uses cosine similarity to find the top 3 most relevant text chunks.
Step 6: Load DeepSeek R1 in Ollama
To process user queries, we need to load the DeepSeek R1 model using Ollama.
File: “ollama_model/deepseek_r1.py”
import ollama
def load_llm(): return ollama.Chat(model="deepseek-r1")
Initializes DeepSeek R1 as the primary language model.
Step 7: RAG Chain – Combining Retrieval with LLM
Once we retrieve the relevant chunks, we pass them to the LLM to generate a response.
File: “app/rag_chain.py”
from ollama_model.deepseek_r1 import load_llmfrom app.retriever import retrieve_chunksdef get_rag_response(query): retrieved_chunks = retrieve_chunks(query) context = "\n".join([chunk.page_content for chunk in retrieved_chunks]) llm = load_llm() response = llm.run(f"Use the following context to answer:\n{context}\n\nQuestion: {query}") return response
This function retrieves relevant text chunks and uses them as context for DeepSeek R1 to generate a response.
Step 8: Create a Web UI with Streamlit
To allow users to interact with the system, we use Streamlit for a simple web interface.
File: “app/streamlit_app.py”
import streamlit as stfrom app.rag_chain import get_rag_responsest.title("RAG System with DeepSeek R1")query = st.text_input("Ask a question:")if query: response = get_rag_response(query) st.write("### Response:") st.write(response)
The app provides a text input for user queries and displays responses.
Run the UI:
streamlit run app/streamlit_app.py
Step 9: Running the Complete RAG System
File: “main.py”
from embeddings.text_splitter import split_textfrom embeddings.vector_store import store_embeddingsdef main(): print("[1/2] Splitting and processing documents...") chunks = split_text("data/sample.pdf") print("[2/2] Generating and storing embeddings...") store_embeddings(chunks) print("Embeddings stored. You can now run the Streamlit app with:\n") print(" streamlit run app/streamlit_app.py")if __name__ == "__main__": main()Once all components are ready, follow these steps to run the full system.
Start Ollama and Ensure DeepSeek R1 is Available
ollama pull deepseek-r1Run the Main Pipeline
python main.pyLaunch the Web UI
streamlit run app/streamlit_app.pySystem Requirements
- CPU: 8-core processor (Intel/AMD)
- RAM: 16GB+
- GPU: NVIDIA RTX 3090+ (for faster inference)
- Disk Space: 20GB+ (for model and embeddings)
- OS: Ubuntu 20.04 / 22.04
Summary
- Documents are split into smaller chunks.
- Embeddings are stored using ChromaDB.
- User queries retrieve relevant document chunks.
- DeepSeek R1 generates answers using context aware retrieval.
- A Streamlit UI enables user interaction.
This completes the setup of a RAG system with DeepSeek R1 using Ollama and LangChain.
Ready to transform your business with our technology solutions? Contact Us today to Leverage Our AI/ML Expertise.
Comment