TL;DR
- The paper applies an unsupervised pipeline to LZ S2 waveform data from January 2022 SR1.
- The pipeline is Fourier subsampling -> UMAP -> HDBSCAN.
- A cluster in the correlation-metric UMAP embedding contains most unphysical drift time events (used as a proxy for accidental coincidences).
Problem setting
LUX-ZEPLIN (LZ) is a dual-phase TPC detector with hundreds of PMTs. Each event produces complex waveforms. The paper argues that relying only on reduced scalar features can hide meaningful waveform structure.
Key idea
Use waveform shapes directly as the primary object, and apply unsupervised embedding plus clustering. Then interpret the clusters using standard physics features after the fact.
Method (high level)
- Fourier-domain subsampling of waveforms to keep low-frequency shape information.
- UMAP for nonlinear dimensionality reduction.
- HDBSCAN for clustering, with cluster interpretation via reduced quantities.


Evidence
The paper reports:
- Distinct clusters in UMAP space that correlate with known detector features.
- A correlation-metric UMAP embedding that isolates a cluster containing a high fraction of unphysical drift time events.
- A Fisher's exact test indicating the separation from WIMP search candidates is unlikely by chance (as reported).

Limitations and caveats
- Cluster separation depends strongly on UMAP metric choice.
- Unphysical drift time is only a proxy for accidental coincidences.
- Some events may be labeled as noise by the clustering algorithm.
Takeaway
The work shows that unsupervised waveform embeddings can surface meaningful structure in LZ data and isolate anomalous populations for follow-up.