Responsive image

Seyed Kamyar Seyed Ghasemipour

kamyar (at) cs {dot} toronto [dot] edu

University of Toronto

Vector Institute

I'm a graduate student in the Machine Learning Group at the University of Toronto and the Vector Institute. My supervisor is Rich Zemel. Broadly, my areas of interest lie at the intersection of Reinforcement Learning and Probablistic methods. More specifically, the types of problems I enjoy thinking about are motivated by two — often incompatible — directions: Developing and understanding algorithms towards practical impact, and Building AGI (building Ironman's Jarvis is the reason I got into A.I.). In past lives I used to do research in Computer Vision and Generative Models.

You can find my CV here.

Announcements

  • September 7, 2019 — CoRL Paper (Oral! :D)

    Our paper "A Divergence Minimization Perspective on Imitation Learning Methods" (with Richard Zemel and Shane Gu) was accepted as an oral at CoRL 2019!

  • September 4, 2019 — NeurIPS Paper

    Our paper "SMILe: Scalable Meta Inverse Reinforcement Learning through Context-Conditional Policies" (with Shane Gu and Richard Zemel) was accepted as a poster at NeurIPS 2019!

  • Earlier Announcements
  • June 1, 2019 — ICML Workshop Oral Presentation

    Our paper "SMILe: Scalable Meta Inverse Reinforcement Learning through Context-Conditional Policies" (with Shane Gu and Richard Zemel) was accepted as an oral presentation to the Imitation, Intent, and Interaction (I3) Workshop at ICML 2019!

  • June 1, 2019 — ICML Workshop Poster

    Our paper "Interpreting Imitation Learning Methods Under a Divergence Minimization Perspective" (with Shane Gu and Richard Zemel) was accepted to the Imitation, Intent, and Interaction (I3) Workshop at ICML 2019!

  • April 20, 2019 — ICLR Workshop Poster

    Our paper "Interpreting Imitation Learning Methods Under a Divergence Minimization Perspective" (with Shane Gu and Richard Zemel) was accepted to the Deep Generative Models for Highly Structured Data Workshop at ICLR 2019!

  • Papers

    Preprints / Under Review


    Conference Publications

  • A Divergence Minimization Perspective on Imitation Learning Methods
    Seyed Kamyar Seyed Ghasemipour, Richard Zemel, Shane Gu
    CoRL 2019 (Oral Presentation)

    abstract

    In many settings, it is desirable to learn decision-making and control policies through learning or bootstrapping from expert demonstrations. The most common approaches under this Imitation Learning (IL) framework are Behavioural Cloning (BC), and Inverse Reinforcement Learning (IRL). Recent methods for IRL have demonstrated the capacity to learn effective policies with access to a very limited set of demonstrations, a scenario in which BC methods often fail. Unfortunately, due to multiple factors of variation, directly comparing these methods does not provide adequate intuition for understanding this difference in performance. In this work, we present a unified probabilistic perspective on IL algorithms based on divergence minimization. We present $f$-MAX, an $f$-divergence generalization of AIRL [Fu et al., 2018], a state-of-the-art IRL method. $f$-MAX enables us to relate prior IRL methods such as GAIL [Ho & Ermon, 2016] and AIRL [Fu et al., 2018], and understand their algorithmic properties. Through the lens of divergence minimization we tease apart the differences between BC and successful IRL approaches, and empirically evaluate these nuances on simulated high-dimensional continuous control domains. Our findings conclusively identify that IRL's state-marginal matching objective contributes most to its superior performance. Lastly, we apply our new understanding of IL method to the problem of state-marginal matching, where we demonstrate that in simulated arm pushing environments we can teach agents a diverse range of behaviours using simply hand-specified state distributions and no reward functions or expert demonstrations. For datasets and reproducing results please refer to https://github.com/KamyarGh/rl_swiss/blob/master/reproducing/fmax_paper.md .

    / camera-ready pdf / code
  • SMILe: Scalable Meta Inverse Reinforcement Learning through Context-Conditional Policies
    Seyed Kamyar Seyed Ghasemipour, Shane Gu, Richard Zemel
    NeurIPS 2019

    abstract

    Imitation Learning (IL) has been successfully applied to complex sequential decision-making problems where standard Reinforcement Learning (RL) algorithms fail. A number of recent methods extend IL to few-shot learning scenarios, where a meta-trained policy learns to quickly master new tasks using limited demonstrations. However, although Inverse Reinforcement Learning (IRL) often outperforms Behavioral Cloning (BC) in terms of imitation quality, most of these approaches build on BC due to its simple optimization objective. In this work, we propose SMILe, a scalable framework for Meta Inverse Reinforcement Learning (Meta-IRL) based on maximum entropy IRL, which can learn high-quality policies from few demonstrations. We examine the efficacy of our method on a variety of high-dimensional simulated continuous control tasks and observe that SMILe significantly outperforms Meta-BC. Furthermore, we observe that SMILe performs comparably or outperforms Meta-DAgger, while being applicable in the state-only setting and not requiring online experts. To our knowledge, our approach is the first efficient method for Meta-IRL that scales to the function approximator setting. For datasets and reproducing results please refer to https://github.com/KamyarGh/rl_swiss/blob/master/reproducing/smile_paper.md .

    / camera-ready pdf / code
  • Workshop Publications

  • SMILe: Scalable Meta Inverse Reinforcement Learning through Context-Conditional Policies
    Seyed Kamyar Seyed Ghasemipour, Shane Gu, Richard Zemel
    Imitation, Intent, and Interaction (I3) Workshop, ICML 2019 (Oral Presentation)

    abstract

    Imitation Learning (IL) has been successfully applied to complex sequential decision-making problems where standard Reinforcement Learning (RL) algorithms fail. A number of recent methods extend IL to few-shot learning scenarios, where a meta-trained policy learns to quickly master new tasks using limited demonstrations. However, although Inverse Reinforcement Learning (IRL) often outperforms Behavioral Cloning (BC) in terms of imitation quality, most of these approaches build on BC due to its simple optimization objective. In this work, we propose SMILe, a scalable framework for Meta Inverse Reinforcement Learning (Meta-IRL) based on maximum entropy IRL, which can learn high-quality policies from few demonstrations. We examine the efficacy of our method on a variety of high-dimensional simulated continuous control tasks and observe that SMILe significantly outperforms Meta-BC. To our knowledge, our approach is the first efficient method for Meta-IRL that scales to the intractable function approximator setting.

    / pdf / code (coming soon) / poster
  • Interpreting Imitation Learning Methods Under a Divergence Minimization Perspective
    Seyed Kamyar Seyed Ghasemipour, Richard Zemel, Shane Gu
    Imitation, Intent, and Interaction (I3) Workshop, ICML 2019
    Deep Generative Models for Highly Structured Data Workshop, ICLR 2019

    abstract

    In many settings, it is desirable to learn decision-making and control policies through learning or from expert demonstrations. The most common approaches under this framework are Behaviour Cloning (BC), and Inverse Reinforcement Learning (IRL). Recent methods for IRL have demonstrated the capacity to learn effective policies with access to a very limited set of demonstrations, a scenario in which BC methods often fail. Unfortunately, directly comparing the algorithms for these methods does not provide adequate intuition for understanding this difference in performance. This is the motivating factor for our work. We begin by presenting $f$-MAX, a generalization of AIRL (Fu et al., 2018), a state-of-the-art IRL method. $f$-MAX provides grounds for more directly comparing the objectives for LfD. We demonstrate that $f$-MAX, and by inheritance AIRL, is a subset of the cost-regularized IRL framework laid out by Ho & Ermon (2016). We conclude by empirically evaluating the factors of difference between various LfD objectives in the continuous control domain.

    / pdf / code (coming soon) / poster
  • Gradient-Based Optimization of Neural Network Architecture
    Will Grathwohl*, Elliot Creager*, Seyed Kamyar Seyed Ghasemipour*, Richard Zemel
    Workshop, ICLR 2018

    abstract

    Neural networks can learn relevant features from data, but their predictive accuracy and propensity to overfit are sensitive to the values of the discrete hyperparameters that specify the network architecture (number of hidden layers, number of units per layer, etc.). Previous work optimized these hyperparmeters via grid search, random search, and black box optimization techniques such as Bayesian optimization. Bolstered by recent advances in gradient-based optimization of discrete stochastic objectives, we instead propose to directly model a distribution over possible architectures and use variational optimization to jointly optimize the network architecture and weights in one training pass. We discuss an implementation of this approach that estimates gradients via the Concrete relaxation, and show that it finds compact and accurate architectures for convolutional neural networks applied to the CIFAR10 and CIFAR100 datasets.

    / pdf
  • Unpusblished Submissions

  • Semi-Supervised Structured Prediction with the Use of Generative Adversarial Networks
    Seyed Kamyar Seyed Ghasemipour, Yujia Li, Jackson Wang, Richard Zemel
    Submitted to ICCV 2017
  • Slides

    SMILe Oral Presentation, Imitation, Intent, and Interaction (I3) Workshop at ICML 2019

    Videos

    SMILe (link coming soon) 15 min, June 15, 2019, Oral Presentation, Imitation, Intent, and Interaction (I3) Workshop at ICML 2019

    Summer 2015 Research Video 1st place undergraduate research video competition

    Summer 2014 Research Video 1st place undergraduate research video competition