Stein Thinning is a tool for post-processing the output of a sampling procedure, such as Markov chain Monte Carlo (MCMC). It aims to minimise a Stein discrepancy, selecting a subsequence of samples that best represent the distributional target.
The user provides two arrays: one containing the samples and another containing the corresponding gradients of the log-target. Stein Thinning returns a vector of indices, indicating which samples were selected.
In favourable circumstances, Stein Thinning is able to:
Implementations of Stein Thinning are available for Python, R, and MATLAB:
First, it is important to parametrise the distributional target so that it has a positive and differentiable density on .
indices = thin(samples, gradients, m)
samplesis an array with rows and columns, whose rows are the samples produced by a sampling method, such as MCMC,
gradientsis an array with rows and columns, whose rows contain the gradients where is the corresponding row of
mis an integer, specifying the number of representative samples required,
indicesis a vector of length , whose elements are integers in , indicating which samples were selected.
Stein Thinning can be used to post-process the output directly from the Stan family of probabilistic programming languages:
Riabiz M, Chen WY, Cockayne J, Swietach P, Niederer SA, Mackey L, Oates CJ (2021) Optimal Thinning of MCMC Output. Journal of the Royal Statistical Society, Series B, to appear. arXiv
Teymur O, Gorham J, Riabiz M, Oates CJ (2021) Optimal Quantisation of Probability Measures Using Maximum Mean Discrepancy. International Conference on Artificial Intelligence and Statistics (AISTATS 2021). Paper
Chopin N, Ducrocq G (2021) Fast compression of MCMC output. arXiv
South LF, Riabiz M, Teymur O, Oates C (2021) Post-Processing of MCMC. Annual Reviews of Statistics and its Application, to appear. arXiv