Martín Soto
I’m a research scientist on AI Alignment
at the UK government’s AI Security Institute.
This means I develop new research agendas to mitigate catastrophic risks from future AI. My current focus is automated Alignment.

Before that, I worked with Owain Evans on AI self-awareness,
did AI risk modelling with the Center on Long-term Risk,
and published research in Mathematical Logic and Decision Theory.
Email | CV | LinkedIn | Alignment Forum | Send me feedback
Publications in AI
- Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs. Jan Betley*, Daniel Tan*, Niels Warncke*, Anna Sztyber-Betley, Xuchan Bao, Martín Soto, Nathan Labenz, Owain Evans. Oral at ICML 2025. [Website, Paper, Tweets]
- Tell me about yourself: LLMs are aware of their learned behaviors. Jan Betley*, Xuchan Bao*, Martín Soto*, Anna Sztyber-Betley, James Chua, Owain Evans. Spotlight at ICLR 2025. [Paper, Blog, Tweets]
- Backfire risks from advances in commitment technology. Duncan McClements, Urvi Gaur, Martín Soto†.
- The marginal impact of AGI interventions. Martín Soto, Tristan Cook. Technical report.
* Equal contribution, † Mentorship role
Publications in Math
- Type-2 Feedback Computability. Juan Aguilera, Martín Soto. Forthcoming in Computability.
- Interactive proofs in Bounded Arithmetics. Martín Soto. Master’s Thesis, presentation.
- The real numbers in inner models of set theory. Martín Soto. Bachelor’s Thesis.
- Logically Updateless Decision-Making. Martín Soto, Abram Demski. Technical report, PIBBSS Symposium 2023.
Talks