Removing Reflections from RAW Photos
CVPR 2025 (Oral Presentation)

Figure 1. Results from our reflection removal system. We accept RAW images as input, and generate new RAW images as output. Note that results in the blog and from the product may differ from those in the the paper. The RAW training data described in the paper were used to train the production models, and yield state-of-the-art results. The production models differ from the paper.

Abstract

We describe a system to remove real-world reflections from images for consumer photography. Our system operates on linear (RAW) photos, and accepts an optional contextual photo looking in the opposite direction (e.g., the "selfie" camera on a mobile device). This optional photo disambiguates what should be considered the reflection. The system is trained solely on synthetic mixtures of real RAW photos, which we combine using a reflection simulation that is photometrically and geometrically accurate. Our system comprises a base model that accepts the captured photo and optional context photo as input, and runs at 256p, followed by an up-sampling model that transforms 256p images to full resolution. The system produces preview images at 1K in 4.5-6.5s on a MacBook or iPhone 14 Pro. We show SOTA results on RAW photos that were captured in the field to embody typical consumer photos, and show that training on RAW simulation data improves performance more than the architectural variations among prior works.

How do we do it?

We use RAW images to simulate reflections with photometric and geometric accuracy. This allows us to create training data that are realistic enough to train models without capturing real reflections that have ground truth—this removes a scaling bottleneck so models can be trained from scratch on millions of images. Prior works require ground truth reflections to be captured at roughly a 10:1 ratio of simulated-to-real, but ground truth capture is difficult, time-consuming, and severely restricts the types of scenes that can be used for training.

Examples of our simulated reflections are shown below, shuffled together with real reflections. Can you spot the simulated ones?

Chess A

Chess A

Chess B

Chess B

Shop A

Shop A

Shop B

Shop B

Art A

Art A

Art B

Art B

Figure 2. Both simulated and real reflections are shown. Can you spot the simulated reflections? Within each (A, B) pair, one reflection is simulated. Real images were not captured to match known simulation examples; these qualitative matches exist because the dataset size exceeds 1M images. Answer:

Even numbered images are synthetic.

What Makes A Simulated Reflection Realistic?

1. Photometric Properties

Reflections are mixtures of two light fields: the first transmits through the glass, and the second comes from the surface of the glass, via reflection from the scene behind you. Photons from these two fields accumulate in equal proportions on the sensor. The camera then exposes the image, and white balances the color (so white looks white). This linear process gives reflections specific characteristics (aka photometric priors), which our models leverage.

Linear Mixing
Reflections are powerful in the shadows, and weak in the highlights.

Our RAW Simulation

8-bit Simulation

Transmission Source

Reflection Source

Post White Balancing
Reflections mix before white balancing. That makes outdoor scenes blueish, and indoor lights yellow (among other things).

Our RAW Simulation

8-bit Simulation

Transmission Source

Reflection Source

Figure 3. Simulating in RAW vs. 8-bit. Using RAW simulation allows us to mix images in linear units, and apply white balance after mixing. This produces simulated reflections that are more realistic than can be achieved by linearly mixing 8-bit images.

2. Geometric Properties

Both the reflection and transmission light field that exit the glass surface are geometrically transformed beforehand. The most important transformation is the Fresnel effect, which dramatically attenuates the radiance of the reflected light. This attenuation is why you can typically see through glass. Reflections are present on every pane of glass (and at every point!), but they are usually too weak to block your view. Reflections can be stronger or weaker depending on the angle of the glass surface, which we simulate as well. Fresnel attenuation accounts for up to -4 stops (6% exposure) where camera rays are perpendicular to the glass surface, and gradually strengthens to -1 stop (50% exposure) for rays that glancingly strike the glass at 83°.

We further simulate double reflections, and a physically calibrated amount of defocus blur. These three key geometric properties are illustrated at extreme values below. Note that most reflections are not blurry.

Fresnel Effect

Double Reflection

Defocus Blur

Figure 4. Geometric properties of reflections. We simulate Fresnel attenuation, double reflections, and defocus blur. These are shown above at extreme values to illustrate how significantly they can affect the reflection appearance.

3. Large-Scale Search

You can simulate reflections by putting together the photometric and geometric properties above. Just pick any random pair of RAW images. But, you'll quickly notice that most reflections that you synthesize aren't particularly useful looking. A lot of them are invisible. Many others are way too complicated, so you can't even understand what you're looking at. These are photometrically accurate reflections, but they don't correspond to ones that photographers would want to remove. We therefore search more than 100M random pairs of RAW images and simulation parameters to find reflections that are visible and visually interpretable. This search uncovers additional characteristics of reflections (aka photometric priors). For example, bathrooms rarely produce reflections on top of beaches because they are too dark. By searching many examples, we produce large scale datasets that can be used to train models from scratch, which yields state-of-the-art results.

Is That All There Is to It?

In a word, no. This work is a large systems paper whose major contribution is to show that training on large datasets of RAW-simulated reflections improves models significantly—even more than the architectural variations among prior works. In the paper we also introduce novel models for dereflection at low resolution and methods to upsample the results to arbitrarily high resolutions. We hope you'll check out the paper, and try it for yourself.

Citation

Acknowledgements

We thank Eric Chan for his helpful explanations of Adobe Camera RAW and Florian Kainz for his help to capture reflection images with ground truth. Test images were additionally contributed for the paper and blog post by Calista Chandler, Daichi Ito, Lars Jebe (the Husky), and Cecilia Zhang.

This website template is due to Michaël Gharbi and was copied from John Barron's github.