Query embedding, vector retrieval, prompt augmentation and LLM response generation.
For RAG / question-answering / knowledge-grounded generation papers and engineering blog posts.
Insert a re-ranking stage between vector retrieval and prompt construction. The re-ranker (a cross-encoder) scores each retrieved chunk against the query and reorders them, keeping the top-k'.
Replace the single vector retrieval with two parallel retrievers: BM25 sparse retrieval and dense embedding retrieval. Their results are merged via reciprocal rank fusion before prompt construction.