Gonçalo Santos Paulo

Interpretability Researcher
Lisbon, PT.

About

Interpretability Researcher at EleutherAI. PhD in Theoretical and Applied Mechanics.

Work

EleutherAI
|

Interpretability Researcher

Summary

Leads cutting-edge research in AI interpretability, developing methods to understand complex model behaviors.

Highlights

Pioneered novel interpretability methods utilizing Sparse Autoencoders to enhance the transparency and reliability of complex AI models.

Developed advanced automated interpretability tools, directly contributing to the field of AI safety and responsible AI development.

Authored and co-authored multiple publications (3 in 2025, 2 in 2024) on SAE interpretability, LLM feature interpretation, and transferability of interpretability methods, significantly advancing scientific discourse.

Sapienza University of Rome
|

PostDoc

Rome, Lazio, Italy

Summary

Conducted advanced research into memristive behavior, focusing on nanophysics and material science.

Highlights

Led experimental and computational research on memristive behavior, investigating hydrophobic gating, nanofluidics, and water intrusion in hydrophobic materials.

Authored and co-authored significant findings, including a publication in Nature Communications on neuromorphic applications of hydrophobically gated memristive nanopores.

Contributed to the understanding of complex material interactions at the nanoscale, providing foundational insights for novel technological applications.

Sapienza University of Rome
|

PhD Researcher (Theoretical and Applied Mechanics)

Rome, Lazio, Italy

Summary

Completed a PhD in Theoretical and Applied Mechanics, focusing on computational approaches to material science.

Highlights

Awarded PhD Summa Cum Laude for thesis on computational approaches to the study of intrusion in hydrophobic materials, demonstrating exceptional academic rigor and research capability.

Developed and implemented advanced computational models to simulate and analyze complex physical phenomena, specifically water intrusion in materials.

Published research in Communications Physics on the impact of secondary channels on wetting properties, contributing to the understanding of nanoscale fluid dynamics.

Collaborated with Professor Alberto Giacomello, applying theoretical mechanics to solve challenging problems in material science.

Education

Sapienza University of Rome
Rome, Lazio, Italy

PhD SUMMA CUM LAUDE

Theoretical and Applied Mechanics

Grade: SUMMA CUM LAUDE

Faculty of Science, University of Lisbon
Lisbon, Lisbon, Portugal

Master Degree

Physics

Faculty of Science, University of Lisbon
Lisbon, Lisbon, Portugal

Bachelors Degree

Physics

Publications

Evaluating SAE interpretability without explanations

Published by

ArXiv

Summary

Evaluating SAE interpretability without explanations

Transcoders Beat Sparse Autoencoders for Interpretability

Published by

ArXiv

Summary

Transcoders Beat Sparse Autoencoders for Interpretability

Sparse Autoencoders Trained on the Same Data Learn Different Features

Published by

ArXiv

Summary

Sparse Autoencoders Trained on the Same Data Learn Different Features

Automatically Interpreting Millions of Features in Large Language Models

Published by

ICML

Summary

Automatically Interpreting Millions of Features in Large Language Models

Does Transformer Interpretability Transfer to RNNs?

Published by

AAAI

Summary

Does Transformer Interpretability Transfer to RNNs?

Hydrophobically gated memristive nanopores for neuromorphic applications

Published by

Nature Communications

Summary

Hydrophobically gated memristive nanopores for neuromorphic applications

The impact of secondary channels on the wetting properties of interconnected hydrophobic nanopores

Published by

Communications Physics

Summary

The impact of secondary channels on the wetting properties of interconnected hydrophobic nanopores

Languages

Portuguese
English
Italian

Skills

AI Interpretability

Mechanistic Interpretability, Sparse Autoencoders, Automated Interpretability Tools, AI Safety, Large Language Models (LLMs), Transformer Interpretability, RNN Interpretability.

Computational Physics & Material Science

Nanofluidics, Memristive Behavior, Hydrophobic Gating, Theoretical Modeling, Scientific Computing, Data Analysis, Simulation, Material Science.

Research & Development

Scientific Research, Experimental Design, Problem-Solving, Peer Review, Academic Writing, Publication, Collaborative Research.

Programming & Tools

Python, Computational Modeling, LaTeX.