Responsive image

Seyed Kamyar Seyed Ghasemipour

kamyar (at) cs {dot} toronto [dot] edu

University of Toronto

Vector Institute

I'm a graduate student in the Machine Learning Group at the University of Toronto and the Vector Institute. I'm also a student research collaborator with the Google Brain Robotics group. My supervisor is Rich Zemel. Broadly, my areas of interest lie at the intersection of reinforcement learning and probablistic methods (generative modeling), with an eye towards applications in robot learning. More specifically, the types of problems I enjoy thinking about are motivated by two — often incompatible — directions: Developing and understanding algorithms towards practical impact, and building AGI (building Ironman's Jarvis is the reason I got into A.I.). In past lives I used to do research in Computer Vision and Generative Models.

You can find my CV here.

Announcements

  • Dec, 2022 — Outstanding Paper Award @ NeurIPS 2022

    The Imagen text-to-image generative model won Outstanding Paper Award at NeurIPS 2022!

  • Dec, 2022 — NeurIPS 2022

    Our offline RL work "Why so pessimistic?" will be presented at NeurIPS 2022!

  • Dec, 2022 — NeurIPS 2022

    The Imagen text-to-image generative model will be presented at NeurIPS 2022!

  • Oct, 2022 — Talk @ Autodesk Research

    Presenting our line of work on robotic magnetic assembly as well as Imagen text-to-image generative models.

  • June, 2022 — ICML 2022

    Our work "Blocks Assemble!" wil be presented at ICML 2022!

  • March, 2022 — Sim2Real Work on Arxiv

    Our effort on Sim2Real transfer of bimanual magnetic assembly policies is available on Arxiv!

  • Earlier Announcements
  • Dec 14, 2021 — Offline RL Workshop NeurIPS 2021

    We are very excited about our work "Why so pessimistic? Estimating uncertainties for offline RL through ensembles, and why their independence matters." (with Shane Gu and Ofir Nachum) which was accepted at the Offline RL Workshop at NeurIPS 2021.

  • May 8, 2021 — ICML 2021

    Our work "EMaQ: Expected-Max Q-Learning Operator for Simple Yet Effective Offline and Online RL" (with Dale Schuurmans and Shane Gu) was accepted at ICML 2021.

  • November 1, 2019 — Best Paper Award @ CoRL 2019!!!! :D

    Our paper "A Divergence Minimization Perspective on Imitation Learning Methods" (with Richard Zemel and Shane Gu) received the Best Paper Award at the Conference on Robot Learning (CoRL) 2019!

  • September 30, 2019 — Research Internship @ Google Brain Robotics

    This semester I am interning with Corey Lynch and Pierre Sermanet at Google Brain Robotics in Mountainview

  • September 7, 2019 — CoRL Paper (Oral! :D)

    Our paper "A Divergence Minimization Perspective on Imitation Learning Methods" (with Richard Zemel and Shane Gu) was accepted as an oral at CoRL 2019!

  • September 4, 2019 — NeurIPS Paper

    Our paper "SMILe: Scalable Meta Inverse Reinforcement Learning through Context-Conditional Policies" (with Shane Gu and Richard Zemel) was accepted as a poster at NeurIPS 2019!

  • June 1, 2019 — ICML Workshop Oral Presentation

    Our paper "SMILe: Scalable Meta Inverse Reinforcement Learning through Context-Conditional Policies" (with Shane Gu and Richard Zemel) was accepted as an oral presentation to the Imitation, Intent, and Interaction (I3) Workshop at ICML 2019!

  • June 1, 2019 — ICML Workshop Poster

    Our paper "Interpreting Imitation Learning Methods Under a Divergence Minimization Perspective" (with Shane Gu and Richard Zemel) was accepted to the Imitation, Intent, and Interaction (I3) Workshop at ICML 2019!

  • April 20, 2019 — ICLR Workshop Poster

    Our paper "Interpreting Imitation Learning Methods Under a Divergence Minimization Perspective" (with Shane Gu and Richard Zemel) was accepted to the Deep Generative Models for Highly Structured Data Workshop at ICLR 2019!

  • Research

    Last Updated (January 5, 2023). For most up to date information please refer to my Google Scholar page.

    Preprints / Under Review

  • Bi-Manual Manipulation and Attachment via Sim-to-Real Reinforcement Learning
    Satoshi Kataoka, Seyed Kamyar Seyed Ghasemipour, Daniel Freeman, Igor Mordatch
    Under Review at ICRA 2023

    abstract

    Assembly of multi-part physical structures is both a valuable end product for autonomous robotics, as well as a valuable diagnostic task for open-ended training of embodied intelligent agents. We introduce a naturalistic physics-based environment with a set of connectable magnet blocks inspired by children's toy kits. The objective is to assemble blocks into a succession of target blueprints. Despite the simplicity of this objective, the compositional nature of building diverse blueprints from a set of blocks leads to an explosion of complexity in structures that agents encounter. Furthermore, assembly stresses agents' multi-step planning, physical reasoning, and bimanual coordination. We find that the combination of large-scale reinforcement learning and graph-based policies -- surprisingly without any additional complexity -- is an effective recipe for training agents that not only generalize to complex unseen blueprints in a zero-shot manner, but even operate in a reset-free setting without being trained to do so. Through extensive experiments, we highlight the importance of large-scale training, structured representations, contributions of multi-task vs. single-task learning, as well as the effects of curriculums, and discuss qualitative behaviors of trained agents.

    / pdf / website
  • Braxlines: Fast and Interactive Toolkit for RL-driven Behavior Engineering beyond Reward Maximization
    Shixiang Shane Gu, Manfred Diaz, Daniel C. Freeman, Hiroki Furuta, Seyed Kamyar Seyed Ghasemipour, Anton Raichuk, Byron David, Erik Frey, Erwin Coumans, Olivier Bachem
    Preprint

    abstract

    The goal of continuous control is to synthesize desired behaviors. In reinforcement learning (RL)-driven approaches, this is often accomplished through careful task reward engineering for efficient exploration and running an off-the-shelf RL algorithm. While reward maximization is at the core of RL, reward engineering is not the only -- sometimes nor the easiest -- way for specifying complex behaviors. In this paper, we introduce braxlines, a toolkit for fast and interactive RL-driven behavior generation beyond simple reward maximization that includes composer, a programmatic API for generating continuous control environments, and set of stable and well-tested baselines for two families of algorithms --mutual information maximization (mimax) and divergence minimization (dmin)-- supporting unsupervised skill learning and distribution sketching as other modes of behavior specification. In addition, we discuss how to standardize metrics for evaluating these algorithms, which can no longer rely on simple reward maximization. Our implementations build on a hardware-accelerated Brax simulator in Jax with minimal modifications, enabling behavior synthesis within minutes of training. We hope braxlines~can serve as an interactive toolkit for rapid creation and testing of environments and behaviors, empowering explosions of future benchmark designs and new modes of RL-driven behavior generation and their algorithmic research.

    / arxiv
  • Conference Publications

  • Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding
    Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour, Burcu Karagol Ayan, S. Sara Mahdavi, Rapha Gontijo Lopes, Tim Salimans, Jonathan Ho, David J Fleet, Mohammad Norouzi
    Outstanding Paper Award, NeurIPS 2022

    abstract

    We present Imagen, a text-to-image diffusion model with an unprecedented degree of photorealism and a deep level of language understanding. Imagen builds on the power of large transformer language models in understanding text and hinges on the strength of diffusion models in high-fidelity image generation. Our key discovery is that generic large language models (e.g. T5), pretrained on text-only corpora, are surprisingly effective at encoding text for image synthesis: increasing the size of the language model in Imagen boosts both sample fidelity and image-text alignment much more than increasing the size of the image diffusion model. Imagen achieves a new state-of-the-art FID score of 7.27 on the COCO dataset, without ever training on COCO, and human raters find Imagen samples to be on par with the COCO data itself in image-text alignment. To assess text-to-image models in greater depth, we introduce DrawBench, a comprehensive and challenging benchmark for text-to-image models. With DrawBench, we compare Imagen with recent methods including VQ-GAN+CLIP, Latent Diffusion Models, and DALL-E 2, and find that human raters prefer Imagen over other models in side-by-side comparisons, both in terms of sample quality and image-text alignment. See https://imagen.research.google/ for an overview of the results.

    / pdf / website
  • Why so pessimistic? Estimating uncertainties for offline RL through ensembles, and why their independence matters.
    Seyed Kamyar Seyed Ghasemipour, Shane Gu, Ofir Nachum
    NeurIPS 2022

    abstract

    Motivated by the success of ensembles for uncertainty estimation in supervised learning, we take a renewed look at how ensembles of -functions can be leveraged as the primary source of pessimism for offline reinforcement learning (RL). We begin by identifying a critical flaw in a popular algorithmic choice used by many ensemble-based RL algorithms, namely the use of shared pessimistic target values when computing each ensemble member's Bellman error. Through theoretical analyses and construction of examples in toy MDPs, we demonstrate that shared pessimistic targets can paradoxically lead to value estimates that are effectively optimistic. Given this result, we propose MSG, a practical offline RL algorithm that trains an ensemble of -functions with independently computed targets based on completely separate networks, and optimizes a policy with respect to the lower confidence bound of predicted action values. Our experiments on the popular D4RL and RL Unplugged offline RL benchmarks demonstrate that on challenging domains such as antmazes, MSG with deep ensembles surpasses highly well-tuned state-of-the-art methods by a wide margin. Additionally, through ablations on benchmarks domains, we verify the critical significance of using independently trained -functions, and study the role of ensemble size. Finally, as using separate networks per ensemble member can become computationally costly with larger neural network architectures, we investigate whether efficient ensemble approximations developed for supervised learning can be similarly effective, and demonstrate that they do not match the performance and robustness of MSG with separate networks, highlighting the need for new efforts into efficient uncertainty estimation directed at RL.

    / pdf / code
  • Blocks Assemble! Learning to Assemble with Large-Scale Structured Reinforcement Learning
    Seyed Kamyar Seyed Ghasemipour, Daniel Freeman, Byron David, Shixiang Shane Gu, Satoshi Kataoka, Igor Mordatch
    ICML 2022

    abstract

    Assembly of multi-part physical structures is both a valuable end product for autonomous robotics, as well as a valuable diagnostic task for open-ended training of embodied intelligent agents. We introduce a naturalistic physics-based environment with a set of connectable magnet blocks inspired by children's toy kits. The objective is to assemble blocks into a succession of target blueprints. Despite the simplicity of this objective, the compositional nature of building diverse blueprints from a set of blocks leads to an explosion of complexity in structures that agents encounter. Furthermore, assembly stresses agents' multi-step planning, physical reasoning, and bimanual coordination. We find that the combination of large-scale reinforcement learning and graph-based policies -- surprisingly without any additional complexity -- is an effective recipe for training agents that not only generalize to complex unseen blueprints in a zero-shot manner, but even operate in a reset-free setting without being trained to do so. Through extensive experiments, we highlight the importance of large-scale training, structured representations, contributions of multi-task vs. single-task learning, as well as the effects of curriculums, and discuss qualitative behaviors of trained agents.

    / pdf / website
  • EMaQ: Expected-Max Q-Learning Operator for Simple Yet Effective Offline and Online RL
    Seyed Kamyar Seyed Ghasemipour, Dale Schuurmans, Shane Gu
    ICML 2021

    abstract

    Off-policy reinforcement learning (RL) holds the promise of sample-efficient learning of decision-making policies by leveraging past experience. However, in the offline RL setting -- where a fixed collection of interactions are provided and no further interactions are allowed -- it has been shown that standard off-policy RL methods can significantly underperform. Recently proposed methods aim to address this shortcoming by regularizing learned policies to remain close to the given dataset of interactions. However, these methods involve several configurable components such as learning a separate policy network on top of a behavior cloning actor, and explicitly constraining action spaces through clipping or reward penalties. Striving for simultaneous simplicity and performance, in this work we present a novel backup operator, Expected-Max Q-Learning (EMaQ), which naturally restricts learned policies to remain within the support of the offline dataset \emph{without any explicit regularization}, while retaining desirable theoretical properties such as contraction. We demonstrate that EMaQ is competitive with Soft Actor Critic (SAC) in online RL, and surpasses SAC in the deployment-efficient setting. In the offline RL setting -- the main focus of this work -- through EMaQ we are able to make important observations regarding key components of offline RL, and the nature of standard benchmark tasks. Lastly but importantly, we observe that EMaQ achieves state-of-the-art performance with fewer moving parts such as one less function approximation, making it a strong, yet easy to implement baseline for future work.

    / arxiv
  • A Divergence Minimization Perspective on Imitation Learning Methods
    Seyed Kamyar Seyed Ghasemipour, Richard Zemel, Shane Gu
    Best Paper Award, Oral Presentation, CoRL 2019

    abstract

    In many settings, it is desirable to learn decision-making and control policies through learning or bootstrapping from expert demonstrations. The most common approaches under this Imitation Learning (IL) framework are Behavioural Cloning (BC), and Inverse Reinforcement Learning (IRL). Recent methods for IRL have demonstrated the capacity to learn effective policies with access to a very limited set of demonstrations, a scenario in which BC methods often fail. Unfortunately, due to multiple factors of variation, directly comparing these methods does not provide adequate intuition for understanding this difference in performance. In this work, we present a unified probabilistic perspective on IL algorithms based on divergence minimization. We present $f$-MAX, an $f$-divergence generalization of AIRL [Fu et al., 2018], a state-of-the-art IRL method. $f$-MAX enables us to relate prior IRL methods such as GAIL [Ho & Ermon, 2016] and AIRL [Fu et al., 2018], and understand their algorithmic properties. Through the lens of divergence minimization we tease apart the differences between BC and successful IRL approaches, and empirically evaluate these nuances on simulated high-dimensional continuous control domains. Our findings conclusively identify that IRL's state-marginal matching objective contributes most to its superior performance. Lastly, we apply our new understanding of IL method to the problem of state-marginal matching, where we demonstrate that in simulated arm pushing environments we can teach agents a diverse range of behaviours using simply hand-specified state distributions and no reward functions or expert demonstrations. For datasets and reproducing results please refer to https://github.com/KamyarGh/rl_swiss/blob/master/reproducing/fmax_paper.md .

    / camera-ready pdf / code
  • SMILe: Scalable Meta Inverse Reinforcement Learning through Context-Conditional Policies
    Seyed Kamyar Seyed Ghasemipour, Shane Gu, Richard Zemel
    NeurIPS 2019

    abstract

    Imitation Learning (IL) has been successfully applied to complex sequential decision-making problems where standard Reinforcement Learning (RL) algorithms fail. A number of recent methods extend IL to few-shot learning scenarios, where a meta-trained policy learns to quickly master new tasks using limited demonstrations. However, although Inverse Reinforcement Learning (IRL) often outperforms Behavioral Cloning (BC) in terms of imitation quality, most of these approaches build on BC due to its simple optimization objective. In this work, we propose SMILe, a scalable framework for Meta Inverse Reinforcement Learning (Meta-IRL) based on maximum entropy IRL, which can learn high-quality policies from few demonstrations. We examine the efficacy of our method on a variety of high-dimensional simulated continuous control tasks and observe that SMILe significantly outperforms Meta-BC. Furthermore, we observe that SMILe performs comparably or outperforms Meta-DAgger, while being applicable in the state-only setting and not requiring online experts. To our knowledge, our approach is the first efficient method for Meta-IRL that scales to the function approximator setting. For datasets and reproducing results please refer to https://github.com/KamyarGh/rl_swiss/blob/master/reproducing/smile_paper.md .

    / camera-ready pdf / code
  • Whitepapers

  • Acme: A Research Framework for Distributed Reinforcement Learning
    Matthew W. Hoffman, Bobak Shahriari, John Aslanides, Gabriel Barth-Maron, Nikola Momchev, Danila Sinopalnikov, Piotr Stańczyk, Sabela Ramos, Anton Raichuk, Damien Vincent, Léonard Hussenot, Robert Dadashi, Gabriel Dulac-Arnold, Manu Orsini, Alexis Jacq, Johan Ferret, Nino Vieillard, Seyed Kamyar Seyed Ghasemipour, Sertan Girgin, Olivier Pietquin, Feryal Behbahani, Tamara Norman, Abbas Abdolmaleki, Albin Cassirer, Fan Yang, Kate Baumli, Sarah Henderson, Abe Friesen, Ruba Haroun, Alex Novikov, Sergio Gómez Colmenarejo, Serkan Cabi, Caglar Gulcehre, Tom Le Paine, Srivatsan Srinivasan, Andrew Cowie, Ziyu Wang, Bilal Piot, Nando de Freitas
    Whitepaper

    abstract

    Deep reinforcement learning (RL) has led to many recent and groundbreaking advances. However, these advances have often come at the cost of both increased scale in the underlying architectures being trained as well as increased complexity of the RL algorithms used to train them. These increases have in turn made it more difficult for researchers to rapidly prototype new ideas or reproduce published RL algorithms. To address these concerns this work describes Acme, a framework for constructing novel RL algorithms that is specifically designed to enable agents that are built using simple, modular components that can be used at various scales of execution. While the primary goal of Acme is to provide a framework for algorithm development, a secondary goal is to provide simple reference implementations of important or state-of-the-art algorithms. These implementations serve both as a validation of our design decisions as well as an important contribution to reproducibility in RL research. In this work we describe the major design decisions made within Acme and give further details as to how its components can be used to implement various algorithms. Our experiments provide baselines for a number of common and state-of-the-art algorithms as well as showing how these algorithms can be scaled up for much larger and more complex environments. This highlights one of the primary advantages of Acme, namely that it can be used to implement large, distributed RL algorithms that can run at massive scales while still maintaining the inherent readability of that implementation. This work presents a second version of the paper which coincides with an increase in modularity, additional emphasis on offline, imitation and learning from demonstrations algorithms, as well as various new agents implemented as part of Acme.

    / pdf
  • Workshop Publications

  • Why so pessimistic? Estimating uncertainties for offline RL through ensembles, and why their independence matters.
    Seyed Kamyar Seyed Ghasemipour, Shane Gu, Ofir Nachum
    Offline RL Workshop at NeurIPS 2021, also Under Review at ICLR 2022

    abstract

    In offline/batch reinforcement learning (RL), the predominant class of approaches with most success have been ``support constraint" methods, where trained policies are encouraged to remain within the support of the provided offline dataset. However, support constraints correspond to an overly pessimistic assumption that actions outside the provided data may lead to worst-case outcomes. In this work, we aim to relax this assumption by obtaining uncertainty estimates for predicted action values, and acting conservatively with respect to a lower-confidence bound (LCB) on these estimates. Motivated by the success of ensembles for uncertainty estimation in supervised learning, we propose MSG, an offline RL method that employs an ensemble of independently updated Q-functions. First, theoretically, by referring to the literature on infinite-width neural networks, we demonstrate the crucial dependence of the quality of derived uncertainties on the manner in which ensembling is performed, a phenomenon that arises due to the dynamic programming nature of RL and overlooked by existing offline RL methods. Our theoretical predictions are corroborated by pedagogical examples on toy MDPs, as well as empirical comparisons in benchmark continuous control domains. In the significantly more challenging antmaze domains of the D4RL benchmark, MSG with deep ensembles by a wide margin surpasses highly well-tuned state-of-the-art methods. Consequently, we investigate whether efficient approximations can be similarly effective. We demonstrate that while some very efficient variants also outperform current state-of-the-art, they do not match the performance and robustness of MSG with deep ensembles. We hope that the significant impact of our less pessimistic approach engenders increased focus into uncertainty estimation techniques directed at RL, and engenders new efforts from the community of deep network uncertainty estimation researchers whom thus far have not employed offline reinforcement learning domains as a testbed for validating modern uncertainty estimation techniques.

    / pdf
  • ABC Problem: An Investigation of Offline RL for Vision-Based Dynamic Manipulation
    Seyed Kamyar Seyed Ghasemipour, Igor Mordatch, Shane Gu
    Embodied Multi-Modal Learning Workshop at ICLR 2021

    abstract

    In recent years, reinforcement learning has had significant successes in the domain of robotics. However, most such successes have either been in robotic locomotion domains or robotic manipulation settings such as pick-and-place, block-stacking, and other quasi-static tasks, both of which lack a requirement of fine-grained geometric reasoning. In this work, we ask the following question: Given current established methodologies in RL, can we obtain effective vision-based policies that solve tasks requiring significant geometric reasoning, and how well do such policies generalize? As a secondary question, we pursue our investigations under the offline/batch RL setting. We study these questions in a simplified simulated rendition of the ``ABC Problem" proposed by Prof. Tenenbaum. In the ABC problem, in each episode an agent is initialized with two random objects to use as its hands (A and B), and the objective is to lift a third randomly selected object (C) from the ground. Due to the varying geometries of the sampled objects, a trained agent must learn to reason about the most effective procedure for lifting the objects. Our empirical results demonstrate that indeed, by training on a limited subset of available objects, vision-based policies obtained through offline RL can significantly improve upon the policies generating the offline datasets, and can transfer to a diversity of objects outside the training distribution. Additionally, we demonstrate that learned policies exhibit novel characteristics not seen in the offline datasets, and we provide evidence that points towards investing efforts in attention architectures for vision-based control policies. Videos can be found in supplementary materials.

    / pdf
  • EMaQ: Expected-Max Q-Learning Operator for Simple Yet Effective Offline and Online RL
    Seyed Kamyar Seyed Ghasemipour, Dale Schuurmans, Shane Gu
    Offline Rl Workshop at NeurIPS 2021

    abstract

    Off-policy reinforcement learning (RL) holds the promise of sample-efficient learning of decision-making policies by leveraging past experience. However, in the offline RL setting -- where a fixed collection of interactions are provided and no further interactions are allowed -- it has been shown that standard off-policy RL methods can significantly underperform. Recently proposed methods aim to address this shortcoming by regularizing learned policies to remain close to the given dataset of interactions. However, these methods involve several configurable components such as learning a separate policy network on top of a behavior cloning actor, and explicitly constraining action spaces through clipping or reward penalties. Striving for simultaneous simplicity and performance, in this work we present a novel backup operator, Expected-Max Q-Learning (EMaQ), which naturally restricts learned policies to remain within the support of the offline dataset without any explicit regularization, while retaining desirable theoretical properties such as contraction. We demonstrate that EMaQ is competitive with Soft Actor Critic (SAC) in online RL, and surpasses SAC in the deployment-efficient setting. In the offline RL setting -- the main focus of this work -- through EMaQ we are able to make important observations regarding key components of offline RL, and the nature of standard benchmark tasks. Lastly but importantly, we observe that EMaQ achieves state-of-the-art performance with fewer moving parts such as one less function approximation, making it a strong, yet easy to implement baseline for future work.

    / arxiv
  • SMILe: Scalable Meta Inverse Reinforcement Learning through Context-Conditional Policies
    Seyed Kamyar Seyed Ghasemipour, Shane Gu, Richard Zemel
    Imitation, Intent, and Interaction (I3) Workshop, ICML 2019 (Oral Presentation)

    abstract

    Imitation Learning (IL) has been successfully applied to complex sequential decision-making problems where standard Reinforcement Learning (RL) algorithms fail. A number of recent methods extend IL to few-shot learning scenarios, where a meta-trained policy learns to quickly master new tasks using limited demonstrations. However, although Inverse Reinforcement Learning (IRL) often outperforms Behavioral Cloning (BC) in terms of imitation quality, most of these approaches build on BC due to its simple optimization objective. In this work, we propose SMILe, a scalable framework for Meta Inverse Reinforcement Learning (Meta-IRL) based on maximum entropy IRL, which can learn high-quality policies from few demonstrations. We examine the efficacy of our method on a variety of high-dimensional simulated continuous control tasks and observe that SMILe significantly outperforms Meta-BC. To our knowledge, our approach is the first efficient method for Meta-IRL that scales to the intractable function approximator setting.

    / pdf / code (coming soon) / poster
  • Interpreting Imitation Learning Methods Under a Divergence Minimization Perspective
    Seyed Kamyar Seyed Ghasemipour, Richard Zemel, Shane Gu
    Imitation, Intent, and Interaction (I3) Workshop, ICML 2019
    Deep Generative Models for Highly Structured Data Workshop, ICLR 2019

    abstract

    In many settings, it is desirable to learn decision-making and control policies through learning or from expert demonstrations. The most common approaches under this framework are Behaviour Cloning (BC), and Inverse Reinforcement Learning (IRL). Recent methods for IRL have demonstrated the capacity to learn effective policies with access to a very limited set of demonstrations, a scenario in which BC methods often fail. Unfortunately, directly comparing the algorithms for these methods does not provide adequate intuition for understanding this difference in performance. This is the motivating factor for our work. We begin by presenting $f$-MAX, a generalization of AIRL (Fu et al., 2018), a state-of-the-art IRL method. $f$-MAX provides grounds for more directly comparing the objectives for LfD. We demonstrate that $f$-MAX, and by inheritance AIRL, is a subset of the cost-regularized IRL framework laid out by Ho & Ermon (2016). We conclude by empirically evaluating the factors of difference between various LfD objectives in the continuous control domain.

    / pdf / code (coming soon) / poster
  • Gradient-Based Optimization of Neural Network Architecture
    Will Grathwohl*, Elliot Creager*, Seyed Kamyar Seyed Ghasemipour*, Richard Zemel
    Workshop, ICLR 2018

    abstract

    Neural networks can learn relevant features from data, but their predictive accuracy and propensity to overfit are sensitive to the values of the discrete hyperparameters that specify the network architecture (number of hidden layers, number of units per layer, etc.). Previous work optimized these hyperparmeters via grid search, random search, and black box optimization techniques such as Bayesian optimization. Bolstered by recent advances in gradient-based optimization of discrete stochastic objectives, we instead propose to directly model a distribution over possible architectures and use variational optimization to jointly optimize the network architecture and weights in one training pass. We discuss an implementation of this approach that estimates gradients via the Concrete relaxation, and show that it finds compact and accurate architectures for convolutional neural networks applied to the CIFAR10 and CIFAR100 datasets.

    / pdf
  • Unpusblished Submissions

  • Semi-Supervised Structured Prediction with the Use of Generative Adversarial Networks
    Seyed Kamyar Seyed Ghasemipour, Yujia Li, Jackson Wang, Richard Zemel
    Submitted to ICCV 2017
  • Slides

    SMILe Oral Presentation, Imitation, Intent, and Interaction (I3) Workshop at ICML 2019

    Videos

    Why so pessimistic? Estimating uncertainties for offline RL through ensembles, and why their independence matters. Offline RL Workshop at NeurIPS 2021

    SMILe (link coming soon) 15 min, June 15, 2019, Oral Presentation, Imitation, Intent, and Interaction (I3) Workshop at ICML 2019

    Summer 2015 Research Video 1st place undergraduate research video competition

    Summer 2014 Research Video 1st place undergraduate research video competition