Categories
Android AR / VR Image Processing

Understand and Apply Stereo Rectification for Depth Maps (Part 2)

In part 1 of the article series, we’ve identified the key steps to create a depth map. We have captured a scene from two distinct positions and loaded them with Python and OpenCV. However, the images don’t line up perfectly fine. A process called stereo rectification is crucial to easily compare pixels in both images to triangulate the scene’s depth!

For triangulation, we need to match each pixel from one image with the same pixel in another image. When the camera rotates or moves forward / backward, the pixels don’t just move left or right; they could also be found further up or down in the image. That makes matching difficult.

Wrapping Images for Stereo Rectification

Image rectification wraps both images. The result is that they appear as if they have been taken only with a horizontal displacement. This simplifies calculating the disparities of each pixel!

With smartphone-based AR like in ARCore, the user can freely move the camera in the real world. The depth map algorithm only has the freedom to choose two distinct keyframes from the live camera stream. As such, the stereo rectification needs to be very intelligent in matching & wrapping the images!

Stereo Rectification: reprojecting images to make calculating depth maps easier.
Stereo Rectification: reprojecting images to make calculating depth maps easier.

In more technical terms, this means that after stereo rectification, all epipolar lines are parallel to the horizontal axis of the image.

To perform stereo rectification, we need to perform two important tasks:

  1. Detect keypoints in each image.
  2. We then need the best keypoints where we are sure they are matched in both images to calculate reprojection matrices.
  3. Using these, we can rectify the images to a common image plane. Matching keypoints are on the same horizontal epipolar line in both images. This enables efficient pixel / block comparison to calculate the disparity map (= how much offset the same block has between both images) for all regions of the image (not just the keypoints!).

Google’s research improves upon the research performed by Pollefeys et al. . Google additionally addresses issues that might happen, especially in mobile scenarios.

Categories
Android AR / VR Image Processing

Easily Create a Depth Map with Smartphone AR (Part 1)

For a realistic Augmented Reality (AR) scene, a depth map of the environment is crucial: if a real, physical object doesn’t occlude a virtual object, it immediately breaks the immersion.

Of course, some devices already include specialized active hardware to create real-time environmental depth maps – e.g., the Microsoft HoloLens or the current high-end iPhones with a Lidar sensor. However, Google decided to go into a different direction: its aim is to bring depth estimation to the mass market, enabling it even for cheaper smartphones that only have a single RGB camera.

In this article series, we’ll look at how it works by analyzing the related scientific papers published by Google. I’ll also show a Python demo based on commonly used comparable algorithms which are present in OpenCV. In the last step, we’ll create a sample Unity project to see depth maps in action. The full Unity example is available on GitHub.

Quick Overview: ARCore Depth Map API

How do Depth Maps with ARCore work? The smartphone saves previous images from the live camera feed and estimates the phone’s motion between these captures. Then, it selects two images that show the same scene from a different position. Based on the parallax effect (objects nearer to you move faster than these farther away – e.g., trees close to a train track move fast versus the mountain in the background moving only very slowly), the algorithm then calculates the distance of this area in the image.

This has the advantage that a single-color camera is enough to estimate the depth. However, this approach needs structured surfaces to detect the movement of unique features in the image. For example, you couldn’t get many insights from two images of a plain white wall, shot from two positions 20 cm apart. Additionally, it’s problematic if the scene isn’t static and objects move around.

As such, given that you have a well-structured and static scene, the algorithm developed by Google works best in a range between 0.5 and 5 meters.

Categories
Cloud Events Speech Assistants

AWS IMAGINE: Accelerating Transformation in Education

As an unexpected catalyst, the COVID-19 pandemic drove rapid change in global education, including improved accessibility for some, affordability, and curricula aligned with job skills needed for the modern world.

Of course, cloud technologies play a fundamental role in the new world of teaching. In the global panel session by AWS, I’ll share insights about:

  • How remote students can solve a real-life wellbeing problem with a working prototype in just 10 days.
  • How learning and retention can be improved through the open-source Voice Learning Alexa skill.
Categories
Speech Assistants

Alexa Development with Voiceflow for Newcomers

Speech assistants are one of the most important ways to access services in the future. They are usable without further instructions even by children and elderly. And they’re hands-free. These advantages are reflected in their growing adoption: according to voicebot.ai, already one third of American households have a smart speaker .

Amazon’s Alexa is leading the market, followed by Google Assistant. Also, Baidu, Alibaba, Xiaomi and Apple Siri are important players. Strategy Analysis runs regular reports on market share data . Obviously, usage is quite different by market. For example, Baidu, Alibaba and Xiaomi are stronger in Asian markets. But overall, Amazon Alexa together with its Echo smart speaker ecosystem is the perfect place to start if you want to reach as many people as possible, globally.

Developing for Amazon Alexa

When you decide to create a “Skill” for Amazon Alexa, you have two basic options:

  • Alexa Skills Kit: Use Amazon’s developer tools directly. This gives you all features but is also the most complex to start. You need to write at least a bit of JavaScript (through Node.js) or Python code. The Alexa-hosted option is easy to set up. You can edit the code right from the browser. No need to provision any other services anymore.
  • 3rd Party Tools: for example, Voiceflow or the Microsoft Bot Framework. While you still need to create the Alexa skill in Amazon’s frontend (so that it is also discoverable by Alexa-powered devices), the skill design & development then mostly happens in these tools. Often, their editors are easier to use and/or even offer cross-platform support.

Especially for people with little experience in JavaScript development or if your skill is simple, 3rd party tools are often the better choice. If you want deep integration into the platform, use the latest features (like Alexa Conversations or the Motion Sensor APIs), go with the Alexa Skills Kit.

Categories
Artificial Intelligence Image Processing

Hands-On “Deep Learning” Videos: Now on YouTube

Every new product or service claims to use deep learning or neural networks. But: how do they really work? What can machine learning do? How complicated is it to get started?

In the 4-part video series “Deep Learning Hands-On with TensorFlow 2 & Python”, you’ll learn what many of the buzzwords are about and how they relate to the problems you want to solve.

By watching the short videos, your journey will start with the background of neural networks, which are the base of deep learning. Then, two practical examples show two concrete applications on how you can use neural networks to perform classification with TensorFlow:

  • Breast cancer classification: based on numerical / categorical data
  • Hand-written image classification: the classic MNIST dataset based on small images

In the last part, we’ll look at one of the most important specialized variants of neural networks: convolutional neural networks (CNNs), which are especially well-suited for image classification.

Watching all four videos gives you a thorough understanding of how deep learning works and the guidance to get started!

Categories
Android AR / VR

Environmental HDR Lighting & Reflections in ARCore: Implementation in Unity 3D (Part 3)

How to make real-time HDR lighting and reflections possible on a smartphone? Based on the unique properties of human perception and the challenges of capturing the world’s state and applying it to virtual objects. Is it still possible?

Google found an interesting approach, which is based on using Artificial Intelligence to fill the missing gaps. In this article, we’ll take a look at how ARCore handles this. The practical implementation of this research is available in the ARCore SDK for Unity. Based on this, a short hands-on guide demonstrates how to create a sphere that reflects the real world – even though the smartphone only captures a fraction of it.

Google ARCore Approach to Environmental HDR Lighting

To still make environmental HDR lighting possible in real-time on smartphones, Google uses an innovative approach, which they also published as a scientific paper . Here, I’ll give you a short, high-level overview of their approach:

First, Google captured a massive amount of training data. The video feed of the smartphone camera captured both the environment, as well as three different spheres. The setup is shown in the image below.

Categories
Android AR / VR

Environmental HDR Lighting & Reflections in ARCore: Virtual Lighting (Part 2)

In part 1, we looked at how humans perceive lighting and reflections – vital basic knowledge to estimate how realistic these cues need to be. The most important goal is that the scene looks natural to human viewers. Therefore, the virtual lighting needs to be closely aligned with real lighting.

But how to measure lighting in the real world, and how to apply it to virtual objects?

Virtual Lighting

How do you need to set up virtual lighting to satisfy the criteria mentioned in part 1? Humans recognize if an object doesn’t fit in:

The left image shows a simple scene setup, where the shadow direction is wrong. The virtual object doesn't fit in.
In the ideal case on the right, the shadow and shading is correct.
Comparing a simple scene setup to environmental HDR lighting. Image adapted from the Google Developer documentation.

The image above from the Google Developer Documentation shows both extremes. Even though you might still recognize that the rocket is a virtual object in the right image, you’ll need to look a lot harder. The image on the left is clearly wrong, especially due to the misplaced shadow.

Categories
Android AR / VR

Environmental HDR Lighting & Reflections in ARCore: Human Perception (Part 1)

Realistically merging virtual objects with the real world in Augmented Reality has a few challenges. The most important:

  1. Realistic positioning, scale and rotation
  2. Lighting and shadows that match the real-world illumination
  3. Occlusion with real-world objects

The first is working very well in today’s AR systems. Number 3 for occlusion is working OK on the Microsoft HoloLens; and it’s soon also coming to ARCore (a private preview is currently running through the ARCore Depth API – which is probably based on the research by Flynn et al. ).

But what about the second item? Google put a lot of effort into this recently. So, let’s look behind the scenes. How does ARCore estimate HDR (high dynamic range) lighting and reflections from the camera image?

Remember that ARCore needs to scale to a variety of smartphones; thus, a requirement is that it also works on phones that only have a single RGB camera – like the Google Pixel 2.

Categories
Digital Healthcare Events Speech Assistants

Alexa for Wellbeing Online Challenge

In the near future, we will primarily interact with technology through voice. Especially for older generations and kids, voice has the lowest entry barrier – compared to the complexity of computers or even smartphones. Simply start talking to speech assistants like Amazon Alexa, and they will help immediately.

To make most use of it, I’ve co-organized the “Alexa for Wellbeing Online Challenge” during the last few weeks. Together with AWS Educate and Hilfswerk Lower Austria, we’ll host a 10-day online hackathon, open to everyone.

Categories
AR / VR

Using Amazon Sumerian in Trainings and Classrooms with AWS IAM

In this article, we’ll configure AWS Identity and Access Management (IAM) to easily use Amazon Sumerian with multiple users. This is especially important for classrooms or trainings. You often don’t want to loose time by having attendees set up and activate their own AWS accounts, including their personal credit cards.

Instead, by setting up sub-users in your account beforehand, you have complete control over the experience and can get started right away. Additionally, it helps with troubleshooting for exercises.

Right now, no ready-made AWS Educate classrooms are available that support Amazon Sumerian. If that changes, the classrooms would be a good alternative, as it gives students their own free AWS credits instead of everything billed to a central account.

Securing Your Account

The first step is making sure you own root account is properly secured. A major part is enabling Multi-Factor Authentication (MFA) for your root account. Especially when working in teams and with source control, it’s an easy-to-make mistake to upload your credentials somewhere; you don’t want others to have full control over your whole AWS account, as this can incur major charges to your credit card. Therefore, it’s best to enable MFA before you continue.