BEGIN:VCALENDAR
PRODID:-//eluceo/ical//2.0/EN
VERSION:2.0
CALSCALE:GREGORIAN
BEGIN:VEVENT
UID:2c4a5367af58b76de868c45738c2762c
DTSTAMP:20260309T063727Z
SUMMARY:CS MSc Thesis Presentation 3 March 2026
DESCRIPTION:Kontakt: birger.swahn@cs.lth.se\n\nTuesday\, 3 March there will
  be a master thesis presentation in Computer Science at Lund University\, 
 Faculty of Engineering.The presentation will take place in E:4130 (Lucas).
 Note to potential opponents: Register as an opponent to the presentation o
 f your choice by sending an email to the examiner for that presentation (f
 irstname.lastname@cs.lth.se). Do not forget to specify the presentation yo
 u register for! Note that the number of opponents may be limited (often to
  two)\, so you might be forced to choose another presentation if you regis
 ter too late. Registrations are individual\, just as the oppositions are! 
 More instructions for opponents are found here on the LTH thesis project p
 age.11:15-12:00 in E:4130 (Lucas)Presenters: Jonathan Giegold\, Jona Waldf
 ogelTitle: Connecting Language Models to Data Systems in the Energy Sector
 Examiner: Jacek MalecSupervisors: Marcus Klang (LTH)\, Christer Friberg (E
 ON)Energy networks are critical infrastructure\, and leveraging recent adv
 ances in AI can support efficient access to operational data. Conversation
 al agents can streamline access to data\, but reliability depends on how t
 hey use enterprise tools. This thesis evaluates the Model Context Protocol
 \, a standard for connecting language models to external systems\, in an e
 nergy-data setting at E.ON. We implement three MCP servers (metadata\, gri
 d infrastructure\, metering) and benchmark Claude Sonnet 4\, GPT-4o-mini\,
  along with a locally hosted Phi-4-mini. Using mirrored private (E.ON) and
  public (UK Power Networks) datasets\, we adapt a protocol-aware benchmark
  combining rule-based checks of tool invocations with rubric-based judging
  of task completion\, grounding\, and planning by o4-mini. We extended the
  benchmark with a deterministic check comparing expected vs generated answ
 ers. Generated tasks require either one server (1-server) or cross-server 
 coordination (2-server). In 1-server tasks\, the cloud models show more th
 an 0.99 schema understanding and high task completion (0.958 for Claude So
 nnet 4 vs 0.940 for GPT-4o-mini)\, with Claude Sonnet 4 producing more fac
 tually correct answers with a higher Expected vs generated score (0.968 vs
  0.907). The cloud models arrive at similar answers\, but use different st
 rategies. Claude Sonnet 4 plans more efficiently\, and GPT-4o-mini is fast
 er but runs more rounds and is 16 times cheaper per task. The 2-server set
 ting decreased task completion to around 0.52 for both cloud models\, and 
 tool calls increased\, resulting in higher execution time and token consum
 ption. The local Phi-4-mini is unreliable beyond lookups with low schema u
 nderstanding (0.776 and 0.628) and task completion (0.355 and 0.138) in 1-
 server and 2-server settings\, respectively. Overall\, domain-based MCP se
 rvers proved easy to extend with additional tools. Still\, our results sho
 w that 2-server workflows induce substantial coordination overhead\, makin
 g server boundaries and tool discovery critical design choices.&nbsp\;\n\n
 Mer information om händelsen: https://www.cs.lth.se/evenemang/cs-msc-thes
 is-presentation-3-march-2026
DTSTART;TZID=GMT:20260303T101500
DTEND;TZID=GMT:20260303T110000
LOCATION:E:4130 (Lucas)
END:VEVENT
END:VCALENDAR
