SSP Forum: Max Lamparth on abstracting human preferences
Max Lamparth
The
Symbolic Systems Forum
(community sessions of SYMSYS 280 - Symbolic Systems Research Seminar)
presents
Robust Abstractions of Human Preferences
Max Lamparth
Stanford Intelligent Systems Laboratory, Hoover Institution
Monday, February 9, 2026
12;30-1:20 pm PT
Computing and Data Science Building (CoDa), Room E160
(New room for 2026)
In-person event, not recorded
Note: Lunch is provided, if pre-ordered, only for members of SYMSYS 280, but others are welcome to bring a lunch and eat during the presentation.
Abstract:
Abstracting human preferences into computational objectives is essential for aligning AI systems, yet fundamentally challenging due to the complexity and context-dependence of human values. This talk examines how preferences are captured through human annotation and translated into reward models for reinforcement learning from human feedback. While enabling state-of-the-art chatbots, I'll present evidence that reward models exhibit (novel) systematic biases and discuss mitigation approaches. Finally, I'll explore alternative methods for learning from preferences and outline key directions for future research.
Max is a Research Fellow at the Hoover Institution’s Technology Policy Accelerator and a member of the Stanford Intelligence Systems Laboratory and the Stanford Center for AI Safety. His research focuses on the security and safety of language models through mechanistic interpretability, reward modeling, and robust evaluation. Prior to his current position, Max was a postdoctoral fellow at Stanford and received his Ph.D. from the technical University of Munich.