Bayesian Coresets and Edward
Published:
Modern datasets often contain a large number of redundant samples, making the storing of data and the learning of models expensive. Coreset computation is an approach to reduce the amount of samples by selecting (and weighting) informative samples and discarding redundant ones.
In Bayesian statistics, Bayesian coresets are designed to preserve a limited number of samples, while guaranteeing that the posterior learned from a coreset would be close to the posterior learned from the whole dataset. An excellent explanation of Bayesian coresets and their application is offered in this video from ICML18.
In this notebook, we run a tutorial in which we explore the computation of Bayesian coresets as proposed in this article and this article. In particular we integrate the original code with the Edward framework, thus exploiting the functions often by probabistic programming to further automate the computation of coresets.