Thiago D. Simão

Office MF 7.092

MetaForum

I am an Assistant Professor in the Department of Mathematics and Computer Science at TU/e. Previously, I was a Ph.D. candidate in the Algorithmics Group at Delft University of Technology, advised by Dr. Matthijs Spaan. Next, I was a PostDoc researcher with the Department of Software Science (SWS) at Radboud University Nijmegen advised by Dr. Nils Jansen. For more details, checkout my biography or my cv .

Research Interests:

The motivation for my research revolves around making AI techniques more reliable, to enable their deployment in real-world applications. I focus on developing AI algorithms for scenarios with constrained interactions with an unknown environment. I am currently interested in safe reinforcement learning, a research topic concerned with problems where a minimum performance must be guaranteed and catastrophic events must be avoided.

Academic Service:

Organization committee of the BeNeRL Workshop 2025, 2018.
Local organizing committee of the 28th ICAPS.
Area Chair for ICLR26, NeurIPS25, NeurIPS24.
SPC for AAMAS24.
PC for AAAI26, RLC25, ICLR25, AAAI25, ICLR24, AAAI24, NeurIPS23, ICML23, AISTATS23, ICAPS23, ICAPS23, NeurIPS22, ICML22, ICAPS22, AAAI21.
Reviewer for JAIR, AIJ, JAAMAS, ICRA, AAAI and BRACIS.

news

2026

April

Our papers “Missingness-MDPs: Bridging the Theory of Missing Data and POMDPs” has been accepted at IJCAI-ECAI-26.

March

Invited lecture on Reliable Offline RL at TCS Seminar at the University of Antwerp.
I am now serving as Action Editor for the Transaction of Machine Learning Research

2025

August

I am serving as an Area Chair for ICLR-26.

July

I am serving as a PC member for the main track and the AI alignment track of AAAI-26.
Our paper “Pessimistic Iterative Planning with RNNs for Robust POMDPs” has been accepted at ECAI-25.

April

Invited talk at the RUB AI Lecture Series.

March

I am serving as an Area Chair for NeurIPS-25.
I am serving as an Senior Reviewer for RLC-25.

February

We are organizing the 2025 edition of the BeNeRL workshop at TU/e.

January

Our papers “Safety-Prioritizing Curricula for Constrained Reinforcement Learning” and “Robust Transfer of Safety-Constrained Reinforcement Learning Agents” have been accepted at ICLR-25.
Our paper “Tighter Value-Function Approximations for POMDPs” has been accepted at AAMAS-25.

2024

November

Hiring. I am looking for a (fully paid) PhD student to work on safe RL under partial observability.

June

Invited talk about “New Safe Practices in Reinforcement Learning” at the Belgium-Netherlands workshop on Reinforcement Learning (BeNeRL) 2024.

May

Back to the University of Verona to teach a mini-series of lectures on designing reliable RL agents.
Our paper “Scalable Safe Policy Improvement for Factored Multi-Agent MDPs” has been accepted at ICML-24.

March

I am serving as an Area Chair for NeurIPS-24.

2023

December

Our papers “Robust Active Measuring under Model Uncertainty” and “Factored Online Planning in Many-Agent POMDPs” have been accepted at AAAI-24.

November

I was recognized as a Top Reviewer for NeurIPS-23.

October

New job! I am now an assistant professor in the Data and AI cluster at Eindhoven University of Technology.

September

The ORLEANS project on Offline Reinforcement Learning for Sustainable Transportation at Sea has received an IPR voucher.
I am serving as a senior PC member for AAMAS-24.
I am serving as a PC member for AAAI-24.

August

I am serving as a PC member for ICLR-24.
Invited talk at the Safe RL workshop at IJCAI 2023.

July

Our paper “Reinforcement Learning by Guided Safe Exploration” has been accepted at ECAI-23.

May

Our paper “Risk-aware Curriculum Generation for Heavy-tailed Task Distributions” has been accepted at UAI-23.

April

Our paper “Scalable Safe Policy Improvement via Monte Carlo Tree Search” has been accepted at ICML-23.
Our papers “Recursive Small-Step Multi-Agent A* for Dec-POMDPs” and “More for Less: Safe Policy Improvement with Stronger Performance Guarantees” have been accepted at IJCAI-23.
Presenting our work on SPI in factored environments at the TiCSA 2023 workshop.
Invited talk at the LiVe 2023 workshop.

March

I am serving as a PC member for NeurIPS-23.

February

Our paper “Act-Then-Measure: Reinforcement Learning for Partially Observable Environments with Active Measuring” has been accepted at ICAPS-23.
I am serving as a PC member for ICML 2023.

January

Our paper “Safe Reinforcement Learning From Pixels Using a Stochastic Latent Representation” has been accepted at ICLR-23.
I successfully defended my PhD thesis. A big thanks to my promotor team and the thesis committee.

2022

December

Invited to teach three lectures in the Reinforcement Learning course at University of Verona.
Our paper “Targeted Adversarial Attacks on Deep Reinforcement Learning Policies via Model Checking” has been accepted at ICAART-23.

November

Our paper “Safe Policy Improvement for POMDPs via Finite-State Controllers” has been accepted at AAAI-23.
Two talks at the AAAI 2022 Fall Symposium.

October

I am serving as a PC member for AISTATS 2023.

September

Our paper “Robust Anytime Learning of Markov Decision Processes” has been accepted at NeurIPS-22.

August

I am serving as a PC member for ICAPS 2023.

July

I am serving as a PC member for NeurIPS 2022.

June

Our paper “Safety-constrained reinforcement learning with a distributional safety critic” has been published at Machine Learning.

May

Two papers presented at the ALA 2022 workshop on Safe Transfer in RL and Solving Hidden Parameter MDPs with Hindsight.

April

Invited talk for the Oden Institute seminar at UT Austin.
Talk at the LiVe-22 workshop about Safe Transfer in Reinforcement Learning.

March

Talk at the ADML meetup about Ensuring Safety for Reinforcement Learning.

January

I am serving as a PC member for ICML 2022.

2021

December

Talk at the iVerif workshop on Safety Abstractions.

October

I am serving as a PC member for the Planning and Learning track at ICAPS 2022.

August

Talk at the PRL workshop.
At ICAPS-21 attending the mentoring program.

June

Invited talk at the Center for Artificial Intelligence.

May

At AAMAS-21 presenting the AlwaysSafe paper.

March

Talk at the LiVe-21 workshop about AlwaysSafe.
Guest lecture on Safe RL at the Algorithms for Intelligent Decision Making course.

February

Invited talk at the SWS-seminar about our AAMAS paper.

2020

December

Our paper “AlwaysSafe: Reinforcement Learning Without Safety Constraint Violations During Training” has been accepted at AAMAS-21.
Our paper “WCSAC: Worst-Case Soft Actor Critic for Safety-Constrained Reinforcement Learning” has been accepted at AAAI-21.

September

I am serving as a PC member for AAAI-21.

May

At AAMAS-20 presenting the paper “Safe Policy Improvement with an Estimated Baseline Policy.”
Released gym-factored, a collection of factored environments that are OpenAI Gym compliant.

2019

August

At IJCAI-19 presenting our paper on structure learning for safe RL.
At IJCAI-19 participating on the doctoral consortium .

May

Attending the conference RLDM-19.
Starting my interniship at MSR Montreal with Romain Laroche and Remi Tachet des Combes.
I got the prize for Best Poster in our department’s poster session.

March

In Hilversum, presenting our work on reinforcement learning at the ICT.Open-19.

January

At AAAI-19 presenting our paper on safe policy improvement in factored environments.

2018

November

I am co-organizing the Belgium Netherlands Workshop on Reinforcement Learning (BeNeRL-18).

October

I am attending the 14th European Workshop on Reinforcement Learning (EWRL-18).

July

I gave a contributed talk at the ICML-18 Workshop on Planning and Learning.

June

I presented a poster at ICAPS-18.
I am helping the local organizing committee of the ICAPS-18 at Delft.
Attending the ICAPS-18 summer school at Noordwijk.

2017

November

I presented a poster at the Energy Event promoted by the PowerWeb Institute.

October

Presenting a poster at the EEMCS’s PhD Event.
I attended the ACAI Summer School on Reinforcement Learning.

August

I attended the 19th European Agent Systems Summer School.

selected publications

ICLR
Safety-Prioritizing Curricula for Constrained Reinforcement Learning

Koprulu, Cevahir, Simão, Thiago D., Jansen, Nils, and Topcu, Ufuk

In ICLR 2025

Abs Bib HTML PDF Code Details

Curriculum learning aims to accelerate reinforcement learning (RL) by generating curricula, i.e., sequences of tasks of increasing difficulty. Although existing curriculum generation approaches provide benefits in sample efficiency, they overlook safety-critical settings where an RL agent must adhere to safety constraints. Thus, these approaches may generate tasks that cause RL agents to violate safety constraints during training and behave suboptimally after. We develop a safe curriculum generation approach (SCG) that aligns the objectives of constrained RL and curriculum learning: improving safety during training and boosting sample efficiency. SCG generates sequences of tasks where the RL agent can be safe and performant by initially generating tasks with minimum safety violations over high-reward ones. We empirically show that compared to the state-of-the-art curriculum learning approaches and their naively modified safe versions, SCG achieves optimal performance and the lowest amount of constraint violations during training.
@inproceedings{Koprulu2025safetyPrioritizing, title = {Safety-Prioritizing Curricula for Constrained Reinforcement Learning}, author = {Koprulu, Cevahir and Sim{\~a}o, Thiago D. and Jansen, Nils and Topcu, Ufuk}, booktitle = {ICLR}, year = {2025} }
ECAI
Reinforcement Learning by Guided Safe Exploration

Yang, Qisong, Simão, Thiago D., Jansen, Nils, Tindemans, Simon H., and Spaan, Matthijs T. J.

In ECAI 2023

Abs arXiv Bib HTML PDF Code Details

Safety is critical to broadening the application of reinforcement learning (RL). Often, we train RL agents in a controlled environment, such as a laboratory, before deploying them in the real world. However, the real-world target task might be unknown prior to deployment. Reward-free RL trains an agent without the reward to adapt quickly once the reward is revealed. We consider the constrained reward-free setting, where an agent (the guide) learns to explore safely without the reward signal. This agent is trained in a controlled environment, which allows unsafe interactions and still provides the safety signal. After the target task is revealed, safety violations are not allowed anymore. Thus, the guide is leveraged to compose a safe behavior policy. Drawing from transfer learning, we also regularize a target policy (the student) towards the guide while the student is unreliable and gradually eliminate the influence from the guide as training progresses. The empirical analysis shows that this method can achieve safe transfer learning and helps the student solve the target task faster.
@inproceedings{Yang2023reinforcement, title = {Reinforcement Learning by Guided Safe Exploration}, author = {Yang, Qisong and Sim{\~a}o, Thiago D. and Jansen, Nils and Tindemans, Simon H. and Spaan, Matthijs T. J.}, booktitle = {ECAI}, year = {2023}, pages = {2858--2865} }
AAAI
Safe Policy Improvement for POMDPs via Finite-State Controllers

Simão, Thiago D., Suilen, Marnix, and Jansen, Nils

In AAAI 2023

Abs arXiv Bib HTML PDF Code Details

We study safe policy improvement (SPI) for partially observable Markov decision processes (POMDPs). SPI is an offline reinforcement learning (RL) problem that assumes access to (1) historical data about an environment, and (2) the so-called behavior policy that previously generated this data by interacting with the environment. SPI methods neither require access to a model nor the environment itself, and aim to reliably improve the behavior policy in an offline manner. Existing methods make the strong assumption that the environment is fully observable. In our novel approach to the SPI problem for POMDPs, we assume that a finite-state controller (FSC) represents the behavior policy and that finite memory is sufficient to derive optimal policies. This assumption allows us to map the POMDP to a finite-state fully observable MDP, the history MDP. We estimate this MDP by combining the historical data and the memory of the FSC, and compute an improved policy using an off-the-shelf SPI algorithm. The underlying SPI method constrains the policy-space according to the available data, such that the newly computed policy only differs from the behavior policy when sufficient data was available. We show that this new policy, converted into a new FSC for the (unknown) POMDP, outperforms the behavior policy with high probability. Experimental results on several well-established benchmarks show the applicability of the approach, even in cases where finite memory is not sufficient.
@inproceedings{Simao2023safe, title = {Safe Policy Improvement for {POMDP}s via Finite-State Controllers}, author = {Sim{\~a}o, Thiago D. and Suilen, Marnix and Jansen, Nils}, booktitle = {AAAI}, year = {2023}, publisher = {{AAAI} Press}, pages = {15109--15117} }