Unsupervised Learning for Anomalous LZ Waveform Data

TL;DR

Problem setting

LUX-ZEPLIN (LZ) is a dual-phase TPC detector with hundreds of PMTs. Each event produces complex waveforms. The paper argues that relying only on reduced scalar features can hide meaningful waveform structure.

Key idea

Use waveform shapes directly as the primary object, and apply unsupervised embedding plus clustering. Then interpret the clusters using standard physics features after the fact.

Method (high level)

  1. Fourier-domain subsampling of waveforms to keep low-frequency shape information.
  2. UMAP for nonlinear dimensionality reduction.
  3. HDBSCAN for clustering, with cluster interpretation via reduced quantities.

HDBSCAN dendrogram

Cosine-embedding overview

Evidence

The paper reports:

UDT vs RQ visualization

Limitations and caveats

Takeaway

The work shows that unsupervised waveform embeddings can surface meaningful structure in LZ data and isolate anomalous populations for follow-up.