How RAG Works: A Simple Analogy
Imagine you're writing a school report:
1. 🕵️ Research Phase (Retrieval): Visit the library to find relevant books
2. ✍️ Writing Phase (Generation): Use those books to write your report
RAG systems work similarly but use AI for both steps!
Setting Up: Tools Explained
Why These Libraries?
-
transformers
: Provides pre-trained AI models (like GPT-2) -
faiss
: Facebook's tool for fast similarity searches -
sentence-transformers
: Converts text to number vectors wikipedia
: Our free knowledge source
Building Blocks of RAG
Step 1: Creating Knowledge Base
# We use Wikipedia as our "library"
def get_wikipedia_content(topic, sentences=10):
try:
# Get simplified summary (like Cliff Notes)
summary = wikipedia.summary(topic, sentences=sentences)
return summary
except wikipedia.exceptions.DisambiguationError as e:
# Handle ambiguous topics (e.g., "Java" could be island or coffee)
return wikipedia.summary(e.options[0], sentences=sentences)
What Happens Here:
- We fetch a concise summary about our topic
- 10 sentences keeps it manageable for beginners
- Error handling prevents crashes on ambiguous terms
Step 2: Understanding Vector Embeddings
# Convert text to numbers (vectors)
model = SentenceTransformer('all-MiniLM-L6-v2')
sentences = [s for s in knowledge_base.split('. ') if s]
embeddings = model.encode(sentences) # Convert sentences to vectors
# Create search index (like library catalog)
index = faiss.IndexFlatL2(dimension) # L2 = Euclidean distance
index.add(np.array(embeddings).astype('float32'))
Key Concepts:
- Vectors are numerical representations of text meaning
- FAISS helps quickly find similar vectors
all-MiniLM-L6-v2
is a lightweight embedding model
Step 3: Finding Relevant Information
def retrieve_info(query, k=3):
query_embedding = model.encode([query]) # Convert question to vector
# Find 3 closest matches (k=3)
distances, indices = index.search(query_embedding, k)
return [sentences[i] for i in indices[0]]
What This Does:
- Your question becomes a "search vector"
- FAISS finds most similar content vectors
- Returns top 3 matches (like best book passages)
Step 4: Generating the Answer
generator = pipeline('text-generation', model='gpt2')
def generate_answer(question, context):
prompt = f"Question: {question}\nContext: {context}\nAnswer:"
# GPT-2 writes answer using context
result = generator(prompt, max_length=200, num_return_sequences=1)
return result[0]['generated_text']
Important Notes:
- GPT-2 is our "writer" AI
- The prompt combines question and context
max_length=200
limits response length
See It in Action
Example 1: Simple Question
question = "What is machine learning?"
context = retrieve_info(question)
# Retrieved context might contain:
# "Machine learning is the study of computer algorithms that improve automatically..."
Example 2: Comparison Question
question = "Difference between AI and machine learning?"
context = retrieve_info(question)
# Might retrieve passages explaining:
# "AI is broader concept, while ML focuses on data-driven algorithms..."