### Seyed Kamyar Seyed Ghasemipour

#### Vector Institute

I'm a graduate student in the Machine Learning Group at the University of Toronto and the Vector Institute. My supervisor is Rich Zemel. Broadly, my areas of interest lie at the intersection of Reinforcement Learning and Probablistic methods. More specifically, the types of problems I enjoy thinking about are motivated by two — often incompatible — directions: Developing and understanding algorithms towards practical impact, and Building AGI (building Ironman's Jarvis is the reason I got into A.I.). In past lives I used to do research in Computer Vision and Generative Models.

You can find my CV here.

## Announcements

• July 22, 2020 — New Preprint

Our preprint "EMaQ: Expected-Max Q-Learning Operator for Simple Yet Effective Offline and Online RL" (with Dale Schuurmans and Shane Gu) is up on arxiv

• November 1, 2019 — Best Paper Award @ CoRL 2019!!!! :D

Our paper "A Divergence Minimization Perspective on Imitation Learning Methods" (with Richard Zemel and Shane Gu) received the Best Paper Award at the Conference on Robot Learning (CoRL) 2019!

• Earlier Announcements
• September 30, 2019 — Research Internship @ Google Brain Robotics

This semester I am interning with Corey Lynch and Pierre Sermanet at Google Brain Robotics in Mountainview

• September 7, 2019 — CoRL Paper (Oral! :D)

Our paper "A Divergence Minimization Perspective on Imitation Learning Methods" (with Richard Zemel and Shane Gu) was accepted as an oral at CoRL 2019!

• September 4, 2019 — NeurIPS Paper

Our paper "SMILe: Scalable Meta Inverse Reinforcement Learning through Context-Conditional Policies" (with Shane Gu and Richard Zemel) was accepted as a poster at NeurIPS 2019!

• June 1, 2019 — ICML Workshop Oral Presentation

Our paper "SMILe: Scalable Meta Inverse Reinforcement Learning through Context-Conditional Policies" (with Shane Gu and Richard Zemel) was accepted as an oral presentation to the Imitation, Intent, and Interaction (I3) Workshop at ICML 2019!

• June 1, 2019 — ICML Workshop Poster

Our paper "Interpreting Imitation Learning Methods Under a Divergence Minimization Perspective" (with Shane Gu and Richard Zemel) was accepted to the Imitation, Intent, and Interaction (I3) Workshop at ICML 2019!

• April 20, 2019 — ICLR Workshop Poster

Our paper "Interpreting Imitation Learning Methods Under a Divergence Minimization Perspective" (with Shane Gu and Richard Zemel) was accepted to the Deep Generative Models for Highly Structured Data Workshop at ICLR 2019!

• ## Papers

### Preprints / Under Review

• Preprint, Under Review

abstract

Off-policy reinforcement learning (RL) holds the promise of sample-efficient learning of decision-making policies by leveraging past experience. However, in the offline RL setting -- where a fixed collection of interactions are provided and no further interactions are allowed -- it has been shown that standard off-policy RL methods can significantly underperform. Recently proposed methods aim to address this shortcoming by regularizing learned policies to remain close to the given dataset of interactions. However, these methods involve several configurable components such as learning a separate policy network on top of a behavior cloning actor, and explicitly constraining action spaces through clipping or reward penalties. Striving for simultaneous simplicity and performance, in this work we present a novel backup operator, Expected-Max Q-Learning (EMaQ), which naturally restricts learned policies to remain within the support of the offline dataset \emph{without any explicit regularization}, while retaining desirable theoretical properties such as contraction. We demonstrate that EMaQ is competitive with Soft Actor Critic (SAC) in online RL, and surpasses SAC in the deployment-efficient setting. In the offline RL setting -- the main focus of this work -- through EMaQ we are able to make important observations regarding key components of offline RL, and the nature of standard benchmark tasks. Lastly but importantly, we observe that EMaQ achieves state-of-the-art performance with fewer moving parts such as one less function approximation, making it a strong, yet easy to implement baseline for future work.

/ arxiv
• ### Conference Publications

• Best Paper Award, Oral Presentation, CoRL 2019

abstract

In many settings, it is desirable to learn decision-making and control policies through learning or bootstrapping from expert demonstrations. The most common approaches under this Imitation Learning (IL) framework are Behavioural Cloning (BC), and Inverse Reinforcement Learning (IRL). Recent methods for IRL have demonstrated the capacity to learn effective policies with access to a very limited set of demonstrations, a scenario in which BC methods often fail. Unfortunately, due to multiple factors of variation, directly comparing these methods does not provide adequate intuition for understanding this difference in performance. In this work, we present a unified probabilistic perspective on IL algorithms based on divergence minimization. We present $f$-MAX, an $f$-divergence generalization of AIRL [Fu et al., 2018], a state-of-the-art IRL method. $f$-MAX enables us to relate prior IRL methods such as GAIL [Ho & Ermon, 2016] and AIRL [Fu et al., 2018], and understand their algorithmic properties. Through the lens of divergence minimization we tease apart the differences between BC and successful IRL approaches, and empirically evaluate these nuances on simulated high-dimensional continuous control domains. Our findings conclusively identify that IRL's state-marginal matching objective contributes most to its superior performance. Lastly, we apply our new understanding of IL method to the problem of state-marginal matching, where we demonstrate that in simulated arm pushing environments we can teach agents a diverse range of behaviours using simply hand-specified state distributions and no reward functions or expert demonstrations. For datasets and reproducing results please refer to https://github.com/KamyarGh/rl_swiss/blob/master/reproducing/fmax_paper.md .

• NeurIPS 2019

abstract

Imitation Learning (IL) has been successfully applied to complex sequential decision-making problems where standard Reinforcement Learning (RL) algorithms fail. A number of recent methods extend IL to few-shot learning scenarios, where a meta-trained policy learns to quickly master new tasks using limited demonstrations. However, although Inverse Reinforcement Learning (IRL) often outperforms Behavioral Cloning (BC) in terms of imitation quality, most of these approaches build on BC due to its simple optimization objective. In this work, we propose SMILe, a scalable framework for Meta Inverse Reinforcement Learning (Meta-IRL) based on maximum entropy IRL, which can learn high-quality policies from few demonstrations. We examine the efficacy of our method on a variety of high-dimensional simulated continuous control tasks and observe that SMILe significantly outperforms Meta-BC. Furthermore, we observe that SMILe performs comparably or outperforms Meta-DAgger, while being applicable in the state-only setting and not requiring online experts. To our knowledge, our approach is the first efficient method for Meta-IRL that scales to the function approximator setting. For datasets and reproducing results please refer to https://github.com/KamyarGh/rl_swiss/blob/master/reproducing/smile_paper.md .

• ### Unpusblished Submissions

• Submitted to ICCV 2017
• ## Slides

SMILe Oral Presentation, Imitation, Intent, and Interaction (I3) Workshop at ICML 2019

## Videos

SMILe (link coming soon) 15 min, June 15, 2019, Oral Presentation, Imitation, Intent, and Interaction (I3) Workshop at ICML 2019

Summer 2015 Research Video 1st place undergraduate research video competition

Summer 2014 Research Video 1st place undergraduate research video competition