RAG Patterns
definition
Retrieval-Augmented Generation (RAG) patterns address how agents dynamically retrieve relevant information from external knowledge sources and inject it into the model's context before generating a response. The core pattern involves chunking documents into segments, embedding them into vectors, storing them in a vector database, and at query time, retrieving the most semantically similar chunks to include in the prompt.
Retrieval-Augmented Generation (RAG) patterns address how agents dynamically retrieve relevant information from external knowledge sources and inject it into the model's context before generating a response. The core pattern involves chunking documents into segments, embedding them into vectors, storing them in a vector database, and at query time, retrieving the most semantically similar chunks to include in the prompt. Advanced RAG patterns include multi-step retrieval (using an initial retrieval to refine the query), hybrid search (combining semantic and keyword matching), re-ranking (using a second model to score relevance), and agentic RAG (letting the agent decide when and what to retrieve). Understanding RAG is essential because it's the primary mechanism for giving agents access to knowledge that doesn't fit in the context window — your codebase, documentation, or domain knowledge. This concept connects to vector databases and embedding models for the infrastructure layer, retrieval-augmented generation for the foundational concept, and graph-vs-vector RAG for understanding when graph-based knowledge retrieval is superior.