maj
CS MSc Thesis Presentation 18 May 2026
One Computer Science MSc thesis to be presented on 18 May
Monday, 18 May there will be a master thesis presentation in Computer Science at Lund University, Faculty of Engineering.
The presentation will take place in E:4130 (Lucas).
Note to potential opponents: Register as an opponent to the presentation of your choice by sending an email to the examiner for that presentation (firstname [dot] lastname [at] cs [dot] lth [dot] se). Do not forget to specify the presentation you register for! Note that the number of opponents may be limited (often to two), so you might be forced to choose another presentation if you register too late. Registrations are individual, just as the oppositions are! More instructions for opponents are found here on the LTH thesis project page.
14:15-15:00 in E:4130 (Lucas) N.B. No more opponents for this presentation
- Presenter: Erik Lundberg
- Title: When Does Graph RAG Pay Off?
- Examiner: Elin A. Topp
- Supervisor: Pierre Nugues (LTH)
Prior work on RAG component ablations has largely evaluated retrieval strategies in isolation or on open-domain benchmarks, leaving unanswered which combinations actually improve answer quality and when graph-based retrieval justifies its complexity over simpler chunk-based alternatives. We ablate nine pipelines, six chunk-based (Dense, Hybrid, Query Expansion, HyDE, Reranking, Combined) and three LightRAG variants that progressively layer dense search and reranking on graph retrieval, across two GraphRAG-Bench domains and a Swedish water treatment corpus from Svenskt Vatten (109 expert-validated questions), evaluated with LLM-judge scoring. Across all three corpora, cross-encoder reranking is the single most impactful component; stacking further enhancements yields diminishing or negative returns. LightRAG outperforms chunk-based pipelines on creative and relational tasks but underperforms on factoid questions, and naively combining graph and dense retrieval degrades performance unless a reranker filters the merged pool. On the water treatment corpus, LightRAG+Mix+Rerank achieves the highest overall correctness and leads on multi-hop abstract reasoning, but at a cost: the Combined chunk-based pipeline trails by only 0.25 points at $k{=}5$ while using roughly a quarter of the input tokens. These findings give practitioners concrete guidance: prioritize reranking, reserve LightRAG for relational domains where its token overhead pays off, and combine graph and dense retrieval only with a reranker in place.
Om evenemanget
Plats:
E:4130 (Lucas)
Kontakt:
birger [dot] swahn [at] cs [dot] lth [dot] se