mar
CS MSc Thesis Presentation 3 March 2026
One Computer Science MSc thesis to be presented on 3 March
Tuesday, 3 March there will be a master thesis presentation in Computer Science at Lund University, Faculty of Engineering.
The presentation will take place in E:4130 (Lucas).
Note to potential opponents: Register as an opponent to the presentation of your choice by sending an email to the examiner for that presentation (firstname [dot] lastname [at] cs [dot] lth [dot] se). Do not forget to specify the presentation you register for! Note that the number of opponents may be limited (often to two), so you might be forced to choose another presentation if you register too late. Registrations are individual, just as the oppositions are! More instructions for opponents are found here on the LTH thesis project page.
11:15-12:00 in E:4130 (Lucas)
- Presenters: Jonathan Giegold, Jona Waldfogel
- Title: Connecting Language Models to Data Systems in the Energy Sector
- Examiner: Jacek Malec
- Supervisors: Marcus Klang (LTH), Christer Friberg (EON)
Energy networks are critical infrastructure, and leveraging recent advances in AI can support efficient access to operational data. Conversational agents can streamline access to data, but reliability depends on how they use enterprise tools. This thesis evaluates the Model Context Protocol, a standard for connecting language models to external systems, in an energy-data setting at E.ON. We implement three MCP servers (metadata, grid infrastructure, metering) and benchmark Claude Sonnet 4, GPT-4o-mini, along with a locally hosted Phi-4-mini. Using mirrored private (E.ON) and public (UK Power Networks) datasets, we adapt a protocol-aware benchmark combining rule-based checks of tool invocations with rubric-based judging of task completion, grounding, and planning by o4-mini. We extended the benchmark with a deterministic check comparing expected vs generated answers. Generated tasks require either one server (1-server) or cross-server coordination (2-server). In 1-server tasks, the cloud models show more than 0.99 schema understanding and high task completion (0.958 for Claude Sonnet 4 vs 0.940 for GPT-4o-mini), with Claude Sonnet 4 producing more factually correct answers with a higher Expected vs generated score (0.968 vs 0.907). The cloud models arrive at similar answers, but use different strategies. Claude Sonnet 4 plans more efficiently, and GPT-4o-mini is faster but runs more rounds and is 16 times cheaper per task. The 2-server setting decreased task completion to around 0.52 for both cloud models, and tool calls increased, resulting in higher execution time and token consumption. The local Phi-4-mini is unreliable beyond lookups with low schema understanding (0.776 and 0.628) and task completion (0.355 and 0.138) in 1-server and 2-server settings, respectively. Overall, domain-based MCP servers proved easy to extend with additional tools. Still, our results show that 2-server workflows induce substantial coordination overhead, making server boundaries and tool discovery critical design choices.
Om evenemanget
Plats:
E:4130 (Lucas)
Kontakt:
birger [dot] swahn [at] cs [dot] lth [dot] se