Martín Soto

Martín Soto

I’m a research scientist on AI Alignment
at the UK government’s AI Security Institute.

This means I develop new research agendas to mitigate catastrophic risks from future AI. My current focus is automated Alignment.

Before that, I worked with Owain Evans on AI self-awareness,
did AI risk modelling with the Center on Long-term Risk,
and published research in Mathematical Logic and Decision Theory.

Email | CV | LinkedIn | Alignment Forum | Send me feedback

Publications in AI

Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs. Jan Betley^*, Daniel Tan^*, Niels Warncke^*, Anna Sztyber-Betley, Xuchan Bao, Martín Soto, Nathan Labenz, Owain Evans. Oral at ICML 2025. [Website, Paper, Tweets]
Tell me about yourself: LLMs are aware of their learned behaviors. Jan Betley*, Xuchan Bao*, Martín Soto*, Anna Sztyber-Betley, James Chua, Owain Evans. Spotlight at ICLR 2025. [Paper, Blog, Tweets]
Backfire risks from advances in commitment technology. Duncan McClements, Urvi Gaur, Martín Soto^†.
The marginal impact of AGI interventions. Martín Soto, Tristan Cook. Technical report.

* Equal contribution, ^† Mentorship role

Publications in Math

Type-2 Feedback Computability. Juan Aguilera, Martín Soto. Forthcoming in Computability.
Interactive proofs in Bounded Arithmetics. Martín Soto. Master’s Thesis, presentation.
The real numbers in inner models of set theory. Martín Soto. Bachelor’s Thesis.
Logically Updateless Decision-Making. Martín Soto, Abram Demski. Technical report, PIBBSS Symposium 2023.

Talks

Evaluating AI Control mitigations, at Singapore Alignment Workshop. Recording.
How can we effectively govern AI?, at Adevinta Barcelona. Post.
Open-source Game Theory, at the Barcelona Logic Seminar. Recording.
Logic & Music, at the Polytechnic University of Barcelona. Recording.