Spotlight

Spotlight on Augmented Reality: Reinhard Köhn

By Jennifer Glen | 10 April 2018

Introducing Reinhard Köhn, Principal Architect on the CTO Team at HERE, based in Berlin, Germany.

Tell me about yourself.

I joined HERE five years ago. The challenge that attracted me was handling the enormous amounts of data this company processes.

Technology and electronics already fascinated me as a kid. I started programming at age twelve in the eighties, with a Sinclair ZX Spectrum, one of these computers that still had a normal tape recorder connected to read and store data. I studied computer engineering at the TU Berlin. Later I worked at Siemens, where I participated in the standards for 3G mobile networks (I even wrote part of the standards documents) and led a team developing software and hardware for one of the first 3G base stations.

After nearly ten years in the corporate world, I decided to try something completely different and joined a startup as a founder and CTO in Berlin to build a social virtual world and gaming product. I learned a lot about big data storage and processing because online games generate a huge number of events from the interaction of the user with the game. To store and process these efficiently at low cost was a challenging problem, and surprisingly close to the problems we solve at HERE when we receive and process billions of traffic and sensor events in our Open Location Platform.

What do you do at HERE?

Anything related to complex software systems needed in the company. In the past I’ve worked on architectures to publish and consume maps efficiently while they constantly change, and on some components for map data processing that are now part of the Open Location Platform. Since last year I've worked on topics around augmented reality.

What’s an example of a problem you’re trying to solve with augmented reality?

In our work the term augmented reality has a very wide scope. Before computers can augment the reality around us, they first have to understand it. And that’s what we are looking at: teaching computers to understand where they are, how they move, what they see. Then we can tell them to change parts of this reality and show it to us as augmented reality.

We’re trying to put an additional layer on top of today’s state-of-the-art AR, like ARKit and simultaneous localization and mapping (SLAM) algorithms, in our core competencies of building maps. Can we run additional algorithms to recognize objects from the real-world—a clock over there, a TV set, a window—record these and build a map autonomously? Can we build a generic system with today’s algorithms that will then evolve fast with the algorithms?

What’s SLAM?

It’s basically what the iPhone does when it does an augmented-reality session. You move the iPhone and it keeps track of certain points around the room, and it simultaneously builds a map of where these points are and how the phone moves relative to them. The movement is measured internally with accelerometers, then it can see from the camera how the points moved in the camera image, and with these two inputs it can build a 3D model. Then it can track that—that’s the localization—and add new points—that’s the mapping. 

We’re still talking pretty conceptually here.

Sounds a bit fuzzy huh? Let’s take a delivery use case. The delivery driver has a street address, they park, but now they have to find the door to ring the bell and hand over the package. That can be surprisingly complex, especially in urban and commercial areas.

Assume you give those drivers a device that does this AR capturing while they deliver. So, the first time he has to ask. He will run around an area three times in a circle and at some point he will find the door. But we can track him all the time because there’s a camera, there’s the SLAM algorithms that build a map while he’s moving. We can detect key elements like signs on the entrance, the numbers, characteristic objects like fire extinguishers or logos of companies and put those in the map so that later, in case we lost tracking, we can find where we are by relocating those objects.

So, you just give the device that will build the AR map to a delivery person for a few weeks, you generate the map out of that data, and next time a colleague has to visit the same place when he leaves the car, he already knows the map to the door, the direct way. Now we can start to use AR in real rendering in a user input device, maybe smart glasses, to guide the person from his car to the door. So, the real AR application is only in the last step where we guide someone. But we can only enable this application if you also automate all the map generation—and updating the map automatically when things change, because maybe a new wall’s been constructed, or the door moved. All of these things have to be tracked. If you go to this level of detailed map you have to automate everything.

Right now we are building a proof of concept of the combination of several algorithms. [Starts a video on his laptop.] So, what we have is tools to visualize the data we get from different algorithms. This is a 3D point cloud of our office area recorded using an iPhone, so this is commodity technology. On the video images we do object detection—computer keyboards, plants, televisions—and now we map back the detected objects to the 3D scene because we know the geometry, we know where the camera has been looking, so we can now figure out the real 3D location of these objects. The proof of concept will go not just from the room as you’re seeing here, but from an outside location to an indoor one.

What have been the biggest challenges?

So far, the biggest challenge has been to narrow down how to approach the end-to-end view of augmented reality and break it down in smaller pieces. Fortunately, we could build on the long experience of our colleagues from the map rendering team for the visualization, which has been part of our mobile SDK for some years. But in other areas we had to start from scratch. What are the newest algorithms for recognizing the world in research? How can we leverage machine learning? We discussed these topics with our researchers, and with the teams already using machine learning technologies, like Steve O’Hara who presented his work in the last edition of Spotlight.

What have been the biggest breakthroughs so far?

Every month there is some new paper that is relevant to what we do and pushing forward the limits of the algorithms we use. By integrating these algorithms in a smart way, we can leverage the individual improvements in the entire system and get exponential improvements. We have three algorithms, all three get improved 20 percent and we get a total improvement of 50 percent or more end-to-end because it accumulates.

What do you know now that you wish you knew when you started working on this problem?

How hot this topic is! Of course, it’s fun and very interesting, but it’s also an emerging technology where recent breakthroughs in research, as well as the constant increase of compute power on mobile devices, will enable applications in the near future that nobody has been thinking about in the past.

Do you see any other applications for what you’ve been doing with augmented reality inside or outside of HERE? 

If you put these technologies on a drone or a robot, you can use them to get localization independent of GPS. There are now GPS jammers, technologies where you can disable GPS in an entire region. So, if you have any safety critical device like a drone and it only has GPS to figure out its position, it’s just not good enough for the future.

What do you see as the next big thing in augmented reality?

Today the applications may be guiding humans, but with the advances in robotics the next thing will be to guide maybe a cleaning robot, or to build this map when a cleaning robot works inside a mall. Because you can recognize objects semantically—for example, fire extinguishers—then the cleaning robot can automatically scan and map the mall and we can capture all the locations of fire extinguishers. And if a fire extinguisher is missing it can be detected automatically from this augmented reality representation, which is computer readable, so you can implement safety measures to replace the missing ones.

You’re talking about using the cleaning robot to map the mall, but I was thinking of it the other way: that this technology would allow the robot to move around the mall and do its job.

It goes both ways. You have this iteration between building the map, giving guidance to the robot with a map, but then also getting feedback and updating the map from the robot. This is in principle very similar to what we see with autonomous cars. It’s just being applied where you have maybe less security and safety requirements because if the robot walks very slowly or is very light…

It’s not going to kill someone like a car? [Interjection made before interviewer watched the Metalhead episode of Black Mirror]

Exactly. Also, the environment is maybe much more controlled: no rain, no snow. In many ways it’s much easier, but these environments can also be more varied.

Any big misconceptions about augmented reality out there that you’d like to clear up?

Augmented reality is more than Pokémon Go. Most people when they talk about augmented reality, they think about glasses or applications—like Facebook and putting ears on people. We’re not working on the application level, but rather what we call spatial-temporal augmented reality, to see how we can augment our maps and data sets with more generic 3D data.

Are you working on any technology-related side projects?

My newest side project is building a FPV (first-person view) racing drone. I just ordered all the components, so I have a few weekends of soldering and fixing electronics ahead of me. I’m looking forward to flying it in summer with my kids.

What resources would you recommend for someone just starting out with augmented reality?

Maybe a Coursera course and a nice intro to object detection, a racoon detector.

Learn more about Reinhard and connect with him here.