Flash-SD-KDE reframes the empirical score and KDE computations to map cleanly onto Tensor Core-accelerated GEMMs, delivering large speedups while keeping the estimator exact.
Details and benchmarks are in: Flash-SD-KDE: Accelerating SD-KDE with Tensor Cores.