Matrix Sketching for Online Analysis of LCLS Imaging Datasets

TL;DR

Problem setting

At the Linac Coherent Light Source (LCLS), detectors produce shot-to-shot image data used for instrument diagnostics and scientific analysis. The paper highlights two main constraints:

  1. Throughput: detectors can run at roughly 120 frames per second.
  2. Dimensionality: beam-profile images can be multi-megapixel, which makes direct analysis expensive.

These pressures motivate compact, mergeable summaries that preserve structure while keeping memory bounded.

Key idea

Use matrix sketching to compress a large batch of images into a smaller summary matrix that preserves the dominant structure. Then apply:

The core twist is to make the sketch rank adaptive based on a user-specified error tolerance rather than a fixed rank.

The sketching objective can be summarized as preserving the covariance structure:

AABB\left\|A^\top A - B^\top B\right\|

Method (high level)

The paper proposes an end-to-end pipeline:

  1. Sketch + PCA: build a compact sketch, then compute a PCA projection.
  2. UMAP to 2D: obtain a visualization suitable for monitoring.
  3. Clustering and outliers: use OPTICS (or related methods) to surface structure.

Streaming pipeline overview

Parallel merge scheme

Scaling: tree-merge sketches

Frequent Directions sketches are mergeable. The paper uses a tree-merge strategy to combine per-core sketches with a logarithmic number of merge steps, which avoids a serial bottleneck at scale.

Evidence in the paper

The paper reports:

Embedding visualization

Limitations and open points

Takeaway

The main contribution is a practical, scalable sketching pipeline that makes online analysis feasible for LCLS-scale imaging streams.