Comparing Machine Learning Strategies for Quantum Noise Reduction

This is currently a work in progress. The todos will be removed once the training has been run.

1. Abstract

Running experiments on real quantum hardware remains extremely costly. Current cloud platforms charge between $0.03 and $0.05 per measurement, and practical algorithms often require tens of thousands of shots. Even modest studies can therefore accumulate thousands of dollars in runtime fees, which places a hard barrier on iterative design and experimentation. Noise mitigation offers a way to extract higher quality results from the same number of measurements, reducing both cost and hardware requirements. This work presents a systematic comparison of four density matrix denoising strategies that pair two model classes, a convolutional autoencoder and a transformer autoencoder, with two loss formulations, a fidelity oriented reconstruction loss and a physics informed structural loss. We will be comparing how closely each model can return a pre-noise density matrix given the corresponding noisy density matrix.

1.1. TODO Once results come in, add a sentence or two discussing the empirical gains

2. Methods

2.1. Convolutional Autoencoder

The convolutional autoencoder follows the architecture introduced by (Karan Kendre, 2025) for density-matrix denoising. The model treats each 32×32 complex-valued density matrix as a two-channel image (real and imaginary components), enabling the network to exploit the local spatial patterns created by different quantum noise channels. The encoder consists of a sequence of convolutional blocks with ReLU activations and 2×2 max-pooling, progressively reducing spatial resolution while expanding the channel dimension (1→32→64). The decoder mirrors the encoder through nearest-neighbor upsampling and transposed convolutions, reconstructing the full resolution density matrix. A final sigmoid activation constrains the reconstructed values to a normalized range.

2.2. Transformer Autoencoder

The Transformer autoencoder models each 32×32 complex-valued density matrix as a sequence of token embeddings, where each token corresponds to a row or column vectorized from the matrix’s real and imaginary parts. This architecture allows every element in the input to attend globally to every other element, preserving the non-local structure characteristic of entangled quantum states. The encoder consists of a stack of Transformer encoder layers with multi-head self-attention, enabling the model to capture global correlations across the entire state. The encoded sequence is compressed through a symmetric bottleneck module that projects the embedding dimension down and back up, serving as a latent representation. The decoder comprises Transformer decoder layers that apply both self-attention within the output sequence and cross-attention to the encoder memory. A linear projection and sigmoid activation generate the reconstructed sequence, which is reshaped into a 2D matrix.

2.3. Uhlmann fidelity-oriented reconstruction loss

The reconstruction loss incorporates the Uhlmann fidelity, the standard measure of similarity between mixed quantum states. For each predicted–target pair of density matrices ($\hat{\rho}$) and ($\rho$), the fidelity is computed by diagonalizing ($\hat{\rho}$), forming its matrix square root, and evaluating the trace of the geometric mean ($\sqrt{\hat{\rho}}$, $\rho$, $\sqrt{\hat{\rho}}$). This procedure captures discrepancies across eigenvalues and coherences, making it substantially more informative than elementwise losses when evaluating density-matrix reconstructions.

The Uhlmann fidelity is defined as \[ F(\hat{\rho}, \rho) = \left( \mathrm{Tr}\sqrt{\sqrt{\hat{\rho}},\rho,\sqrt{\hat{\rho}}}\right)^2 . \]

2.4. Physics-informed loss

The physics-informed loss enforces the basic structural axioms of a valid density matrix during reconstruction. Given a predicted state ($\hat{\rho}$), the loss penalizes violations of Hermiticity, trace normalization, and positive semidefiniteness, which are the three core constraints defining legitimate quantum states. Hermiticity is enforced by measuring the deviation between ($\hat{\rho}$) and its conjugate transpose, trace preservation is encouraged by penalizing deviations of ($\mathrm{Tr}(\hat{\rho})$) from unity, and positivity is promoted by penalizing negative eigenvalues of ($\hat{\rho}$). Together, these terms guide the model toward producing physically consistent outputs, irrespective of the noise model or architecture.

Formally, the loss takes the form \[ \mathcal{L}_{\mathrm{phys}}(\hat{\rho}) = \lambda_{\mathrm{herm}} \left|\hat{\rho} - \hat{\rho}^\dagger\right|_2^2 + \lambda_{\mathrm{trace}},\big(\mathrm{Tr}(\hat{\rho}) - 1\big)^2 + \lambda_{\mathrm{psd}},\big| \min(0,, \lambda_i(\hat{\rho})) \big|_2^2 , \]

where ($\lambda_i(\hat{\rho})$) are the eigenvalues of ($\hat{\rho}$).

2.5. Dataset

Clean states are generated by sampling random pure states and evolving them under randomly drawn single and two qubit unitaries. Each state is then corrupted by a stochastic noise channel that includes a mixture of depolarizing, amplitude damping, and phase damping noise with randomly sampled strengths. This produces paired datasets of noisy inputs and clean targets for supervised training. The full dataset contains one million simulated examples drawn uniformly from twenty noise type and noise level combinations. Each of the four model and loss configurations receives the same training budget of 100,000 circuits, with 80% set aside for training and the other 20% set aside for testing.

2.6. TODO Add link to circuit generator

3. Experimental Design

We evaluate the four denoising configurations under a controlled and reproducible simulation pipeline designed to isolate the effects of architecture and loss function. All models are trained to map noisy density matrices to their clean counterparts using identical optimization settings. Training uses the Adam optimizer with a learning rate of 1e−4, batch size 64, and a fixed budget of 50 epochs. Weight initialization follows PyTorch defaults, and a validation split of ten percent of the training set is used for early stopping based on validation fidelity. Each model is trained from five different random seeds to account for stochastic variation, and reported results are averaged across these runs.

Evaluation is performed on the held out test set using multiple metrics. These include Uhlmann fidelity between the reconstructed and target states, Frobenius reconstruction error, trace deviation, Hermiticity deviation, and the magnitude of any negative eigenvalues. This combination of metrics allows us to assess both denoising performance and physical validity. Model capacities are kept comparable across architectures by matching parameter counts within ten percent.

Comparing Machine Learning Strategies for Quantum Noise Reduction

1. Abstract

1.1. TODO Once results come in, add a sentence or two discussing the empirical gains

2. Methods

2.1. Convolutional Autoencoder

2.2. Transformer Autoencoder

2.3. Uhlmann fidelity-oriented reconstruction loss

2.4. Physics-informed loss

2.5. Dataset

2.6. TODO Add link to circuit generator

3. Experimental Design

3.1. TODO Organize all hyperparameters (weight decay, dropout rates, transformer depth + heads, CNN kernel sizes and strides, etc.) into a table)

3.2. TODO Add details on compute used

4. Results

4.1. TODO Run training + evals

5. Limitations

6. Conclusion

6.1. TODO Add conclusion

7. References

All Pages