Digital Healthcare, Augmented Reality, Machine Learning, Cloud Computing and more! Andreas Jakl is a professor @ St. Pölten University of Applied Sciences, ex-Microsoft MVP for Windows Development and Amazon AWS Educate Cloud Ambassador & Community Builder.
Realistically merging virtual objects with the real world in Augmented Reality has a few challenges. The most important:
Realistic positioning, scale and rotation
Lighting and shadows that match the real-world illumination
Occlusion with real-world objects
The first is working very well in today’s AR systems. Number 3 for occlusion is working OK on the Microsoft HoloLens; and it’s soon also coming to ARCore (a private preview is currently running through the ARCore Depth API – which is probably based on the research by Flynn et al. ).
But what about the second item? Google put a lot of effort into this recently. So, let’s look behind the scenes. How does ARCore estimate HDR (high dynamic range) lighting and reflections from the camera image?
Remember that ARCore needs to scale to a variety of smartphones; thus, a requirement is that it also works on phones that only have a single RGB camera – like the Google Pixel 2.
The goal is simple: realistic HDR lighting for virtual objects. Ideally, this should work from low dynamic range source images – as the real-time camera stream that feeds into smartphone-based AR systems can’t capture HDR lighting. The less source material the algorithm requires to predict the lighting, the better; a single frame would of course be ideal. Is that possible?
In the publication Learning to predict indoor illumination from a single image by Gardner et al. , they showed some impressive results estimating the light source from a normal photo and applying a similar lighting situation to the virtual objects. This affects both the location of lights as well as their intensity. The underlying algorithm includes a deep convolutional neural network.
Specular highlights: shiny bits of surfaces; move with the viewer
Shadows: layout of scene & where lights come from
Shading: surface + light influences reflection
Reflection: reflected colors
All these properties directly influence the color and brightness of each pixel in an image. On the one hand side, the AR engine needs to work with these inputs to estimate the light and material properties. On the other hand, similar settings then have to be applied to the virtual objects.
Human Perception & Lighting
A key fact to keep in mind is that we as humans usually only indirectly perceive the light field. We can only see object appearances and use our experience to infer the source.
The question is: how good are we doing this? Several studies tried to find this out. te Pas et al. asked humans to judge images of two spheres to answer three questions:
Are they the same material?
Are they illuminated the same way?
Is illumination or material the same?
The spheres were either photographs or computer-generated. This shows an example of what users had to judge in the experiment:
In the image above, both spheres are from real photos and not computer generated. However, they differ both in illumination and material. Not easy to judge, right? It gets a bit clearer once you look at the images of the test set with a little more context, as seen here:
What were the results of the study? They recommend the following:
To make correct material perception possible, include higher order aspects of the light field and apply a realistic 3D texture (meso scale texture).
To make a correct perception of the light field possible, you need to put emphasis on the realism of global light properties (in particular its mean direction & diffuseness).
How is it possible to computationally perceive the light field of a scene? This will be covered in the second part of the article series. Finally, in the third part, I’ll show an example of how you can visualize ARCore’s reflection map in a Unity scene.
S. F. te Pas and S. C. Pont, “A comparison of material and illumination discrimination performance for real rough, real smooth and computer generated smooth spheres,” in Proceedings of the 2nd symposium on Applied perception in graphics and visualization, in APGV ’05. A Coroña, Spain: Association for Computing Machinery, Aug. 2005, pp. 75–81. doi: 10.1145/1080402.1080415.
M.-A. Gardner et al., “Learning to predict indoor illumination from a single image,” ACM Trans. Graph., vol. 36, no. 6, p. 176:1-176:14, Nov. 2017, doi: 10.1145/3130800.3130891.
The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.
The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
The technical storage or access that is used exclusively for statistical purposes.The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.