BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Date iCal//NONSGML kigkonsult.se iCalcreator 2.20.2//
METHOD:PUBLISH
X-WR-CALNAME;VALUE=TEXT:Eventi DIAG
BEGIN:VTIMEZONE
TZID:Europe/Paris
BEGIN:STANDARD
DTSTART:20241027T030000
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
TZNAME:CET
END:STANDARD
BEGIN:DAYLIGHT
DTSTART:20250330T020000
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
TZNAME:CEST
END:DAYLIGHT
END:VTIMEZONE
BEGIN:VEVENT
UID:calendar.28660.field_data.0@diag.uniroma1.it
DTSTAMP:20260404T212015Z
CREATED:20250226T092337Z
DESCRIPTION:Abstract Policy Mirror Descent is a powerful and theoretically 
 sound methodology for sequential decision-making. However\, it is not dire
 ctly applicable to Reinforcement Learning due to the inaccessibility of ex
 plicit action-value functions. We address this challenge by introducing a 
 novel approach based on learning a world model of the environment using co
 nditional mean embeddings. Leveraging tools from operator theory\, we deri
 ve a closed-form expression of the action-value function in terms of the w
 orld model via simple matrix operations. Combining these estimators with P
 MD leads to POWR\, a new RL algorithm for which we prove convergence rates
  to the global optimum. Preliminary experiments in finite and infinite sta
 te settings support the effectiveness of our method. Pietro Novelli is a p
 hysicist and a postdoc researcher at Istituto Italiano di Tecnologia\, wit
 hin the Computational Statistics & ML unit. He is currently working on mac
 hine learning for dynamical systems\, reinforcement learning\, machine lea
 rning for science\, statistical learning theory & optimization. Pietro's w
 ork has been presented at NeurIPS 2024.
DTSTART;TZID=Europe/Paris:20250306T150000
DTEND;TZID=Europe/Paris:20250306T150000
LAST-MODIFIED:20250226T104057Z
LOCATION:Room A5 DIAG
SUMMARY:Operator World Models for Reinforcement Learning - Pietro Novelli -
  Pietro Novelli
URL;TYPE=URI:http://diag.uniroma1.it/node/28660
END:VEVENT
END:VCALENDAR
