Inference and Distillation for Option Learning

Igl, Maximilian, Wendelin Boehmer, Andrew Gambardella, Philip HS Torr, Nantas Nardelli, N. Siddharth, and Shimon Whiteson. “Inference and Distillation for Option Learning.” In Workshop on Probabilistic Reinforcement Learning and Structured Control@ NeurIPS 2018: Infer to Control . 2018.
URL1

We present Inference and Distillation for Option Learning (IDOL), a multitask option-learning framework based on Planning-as-Inference. IDOL employs a hierarchical prior and variational-posterior factorisation to learn temporally extended options that allow the higher-level master policy to make decisions with lower frequency, speeding up training on new tasks. IDOL autonomously learns the temporal extension of each option and avoids suboptimal solutions where multiple options learn similar behavior. We demonstrate that this improves performance on new tasks compared to both strong hierarchical and flat transfer-learning baselines.