Main content start

SSP Forum: Max Lamparth on abstracting human preferences

Monday, February 9, 2026
CoDa E160
Max Lamparth
Image caption:

Max Lamparth

The
Symbolic Systems Forum
(community sessions of SYMSYS 280 - Symbolic Systems Research Seminar)
presents

Robust Abstractions of Human Preferences

Max Lamparth
Stanford Intelligent Systems Laboratory, Hoover Institution

Monday, February 9, 2026
12;30-1:20 pm PT
Computing and Data Science Building (CoDa), Room E160
(New room for 2026)
In-person event, not recorded

Note: Lunch is provided, if pre-ordered, only for members of SYMSYS 280, but others are welcome to bring a lunch and eat during the presentation.

Abstract:

Abstracting human preferences into computational objectives is essential for aligning AI systems, yet fundamentally challenging due to the complexity and context-dependence of human values. This talk examines how preferences are captured through human annotation and translated into reward models for reinforcement learning from human feedback. While enabling state-of-the-art chatbots, I'll present evidence that reward models exhibit (novel) systematic biases and discuss mitigation approaches. Finally, I'll explore alternative methods for learning from preferences and outline key directions for future research.

Max is a Research Fellow at the Hoover Institution’s Technology Policy Accelerator and a member of the Stanford Intelligence Systems Laboratory and the Stanford Center for AI Safety. His research focuses on the security and safety of language models through mechanistic interpretability, reward modeling, and robust evaluation. Prior to his current position, Max was a postdoctoral fellow at Stanford and received his Ph.D. from the technical University of Munich.