AI/ML

Building a RAG System with DeepSeek R1, Ollama and LangChain

Free Installation Guide - Step by Step Instructions Inside!

Overview

A step by step guide to setting up a local Retrieval Augmented Generation (RAG) system using DeepSeek R1 as the LLM, Ollama as the model server and LangChain for retrieval.

RAG (Retrieval Augmented Generation) enhances LLMs by integrating a document retrieval mechanism, allowing them to generate more accurate and context aware responses. In this guide, we will:

Load DeepSeek R1 using Ollama.
Process and store document embeddings.
Retrieve relevant documents based on user queries.
Generate responses using retrieved context.

Step 1: Install Required Dependencies

Before setting up the system, install the necessary dependencies:

pip install langchain langchain-community chromadb pypdf streamlit ollama

LangChain: Framework for retrieval-based LLM applications.
Chromadb: Vector database for storing and searching embeddings.
PyPDF: Used for loading and parsing PDF documents.
Ollama: Runs the DeepSeek R1 model locally.

Installing DeepSeek R1 in Ollama

Run the following command to download DeepSeek R1 to your machine:

ollama pull deepseek-r1

Step 2: Project Structure

Below is the recommended project structure:

rag-system/│── embeddings/│ ├── __init__.py│ ├── text_splitter.py # Splits documents into smaller chunks│ ├── vector_store.py # Handles embeddings and storage│── ollama_model/│ ├── __init__.py│ ├── deepseek_r1.py # Loads DeepSeek R1 with Ollama│── app/│ ├── __init__.py│ ├── retriever.py # Retrieves relevant document chunks│ ├── rag_chain.py # Generates final response│ ├── streamlit_app.py # Web UI for interaction│── data/│ ├── sample.pdf # Example document for testing│── requirements.txt # Required dependencies│── .env # API keys (if needed)│── main.py # Main entry point

Step 3: Load and Process Documents

To ensure efficient retrieval, we need to split large documents into small chunks before storing embeddings.

File: “embeddings/text_splitter.py”

from langchain.text_splitter import RecursiveCharacterTextSplitterfrom langchain.document_loaders import PyPDFLoaderdef split_text(file_path): loader = PyPDFLoader(file_path) documents = loader.load() splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50) return splitter.split_documents(documents)

This script reads a PDF file, extracts text, and splits it into chunks of 500 characters.

Step 4: Generate and Store Embeddings

Now, we need to convert the text chunks into embeddings and store them in a vector database.

File: “embeddings/vector_store.py”

from langchain.vectorstores import Chromafrom langchain.embeddings import OllamaEmbeddingsdef store_embeddings(chunks): embeddings = OllamaEmbeddings(model="deepseek-r1") vector_store = Chroma.from_documents(chunks, embeddings, persist_directory="./vector_db") vector_store.persist()

Uses ChromaDB to store text embeddings.

DeepSeek R1 is used to generate embeddings via Ollama.

Step 5: Retrieve Relevant Information

When a user asks a question, we retrieve the most relevant text chunks from the vector database.

File: “app/retriever.py”

from langchain.vectorstores import Chromadef retrieve_chunks(query): vector_store = Chroma(persist_directory="./vector_db") return vector_store.similarity_search(query, k=3)

Uses cosine similarity to find the top 3 most relevant text chunks.

Step 6: Load DeepSeek R1 in Ollama

To process user queries, we need to load the DeepSeek R1 model using Ollama.

File: “ollama_model/deepseek_r1.py”

import ollama

def load_llm():

return ollama.Chat(model="deepseek-r1")

Initializes DeepSeek R1 as the primary language model.

Step 7: RAG Chain – Combining Retrieval with LLM

Once we retrieve the relevant chunks, we pass them to the LLM to generate a response.

File: “app/rag_chain.py”

from ollama_model.deepseek_r1 import load_llmfrom app.retriever import retrieve_chunksdef get_rag_response(query): retrieved_chunks = retrieve_chunks(query) context = "\n".join([chunk.page_content for chunk in retrieved_chunks]) llm = load_llm() response = llm.run(f"Use the following context to answer:\n{context}\n\nQuestion: {query}") return response

This function retrieves relevant text chunks and uses them as context for DeepSeek R1 to generate a response.

Step 8: Create a Web UI with Streamlit

To allow users to interact with the system, we use Streamlit for a simple web interface.

File: “app/streamlit_app.py”

import streamlit as stfrom app.rag_chain import get_rag_responsest.title("RAG System with DeepSeek R1")query = st.text_input("Ask a question:")if query: response = get_rag_response(query) st.write("### Response:") st.write(response)

The app provides a text input for user queries and displays responses.

Run the UI:

streamlit run app/streamlit_app.py

Step 9: Running the Complete RAG System

File: “main.py”

from embeddings.text_splitter import split_textfrom embeddings.vector_store import store_embeddingsdef main(): print("[1/2] Splitting and processing documents...") chunks = split_text("data/sample.pdf") print("[2/2] Generating and storing embeddings...") store_embeddings(chunks) print("Embeddings stored. You can now run the Streamlit app with:\n") print(" streamlit run app/streamlit_app.py")if __name__ == "__main__": main()

Once all components are ready, follow these steps to run the full system.

Start Ollama and Ensure DeepSeek R1 is Available

ollama pull deepseek-r1

Run the Main Pipeline

python main.py

Launch the Web UI

streamlit run app/streamlit_app.py