stein_thinning.thinning.thin_gf

Contents

stein_thinning.thinning.thin_gf#

stein_thinning.thinning.thin_gf(sample: ndarray, log_p: ndarray, log_q: ndarray, gradient_q: ndarray, n_points: int, standardize: bool = True, preconditioner: str = 'id', range_cap: float | None = None) ndarray#

Optimally select m points from n > m samples generated from a target distribution of d dimensions.

This function is based on the gradient-free kernel Stein discrepancy, which uses an auxiliary distribution q as a proxy for the target distribution. This is useful when the gradient of the target distribution is difficult to obtain, so instead the gradient of the proxy distribution is used.

Parameters#

sample: np.ndarray

n x d array where each row is a sample point.

log_p: np.ndarray

n x 1 array of log-pdf values for the target distribution corresponding to points in sample.

log_q: np.ndarray

n x 1 array of log-pdf values for the proxy distribution corresponding to points in sample.

gradient_q: np.ndarray

n x d array of gradient of the proxy distribution corresponding to points in sample.

n_points: int

integer specifying the desired number of points.

standardize: bool

optional logical, either ‘True’ (default) or ‘False’, indicating whether or not to standardise the columns of sample around means using the mean absolute deviation from the mean as the scale.

preconditioner: str

optional string, either ‘id’ (default), ‘med’, ‘sclmed’, or ‘smpcov’, specifying the preconditioner to be used. Alternatively, a numeric string can be passed as the single length-scale parameter of an isotropic kernel.

range_cap: Optional[float]

if provided, the values of log_q - log_p will be clipped above, so that the resulting range is at most range_cap

Returns#

np.ndarray

array shaped (m,) containing the row indices in sample (and gradient) of the selected points.