Flash-SD-KDE on Tensor Cores

Flash-SD-KDE reframes the empirical score and KDE computations to map cleanly onto Tensor Core-accelerated GEMMs, delivering large speedups while keeping the estimator exact.

Details and benchmarks are in: Flash-SD-KDE: Accelerating SD-KDE with Tensor Cores.