Spotlight on Machine Learning: Oliver Monson

By Jennifer Glen | 08 November 2017

Try HERE Maps

Create a free API key to build location-aware apps and services.

Get Started

“Have humans focus on things that humans are really good at and machines can’t replace. That’s the transition.” – Oliver Monson

Image credit: “Unicorn Crossing” by rumpleteaser is licensed under CC BY 2.0

The transcript of this interview has been condensed and lightly edited for clarity.

You're a Data Operations Manager in our Highly Automated Driving division at HERE. What do you do?

When I first started with HERE, I was focused on dealing with the data from the [HERE True] vehicles and making it useful to the rest of the organization. Two years ago, we transitioned to focusing on the highly automated driving use case. I’m making sure that that data I dealt with back then is useful to the rest of the org, specifically for machine learning and algorithm development. So, taking that imagery—that Lidar data we collect with these vehicles, petabytes of data that’s sitting in our cloud—and converting it to useful data that we can build automated feature extractors from.

Traditionally, we have people look at that data and make a 2D map out of it. That doesn’t scale for making a really detailed 3D map that’s required for autonomous driving. So, my role supports the teams that are trying to automate that map creation process.

What’s a more specific example of a problem you have to solve?

Detecting all the signs in the world. Signage is very useful to mapmaking. The attributes that are on the sign: the speed limits, the stop signs, the warning indicators—all that signage tells the driver something about the environment, the rules of the road. There’s thousands of different types of signs and every country has their own variations, languages, and symbols. We can look at imagery and label a sign or label the lanes and make a map that way, but that requires a human to do that. So now can we replace that human with a machine that can detect signs automatically.

I support building that vast library of training data, so that we can detect those signs automatically and classify them. In order to develop these algorithms, we leverage our internal workforce and crowdsourcing options to human label this data as input.

What’s the strangest sign you’ve ever seen?

The variation of animal crossing signs around the world is amazing and so very specific. Kangaroo signs in Australia—sure, but frog crossing signs in Germany and France are something you don’t see very often here in the United States.

So how do you detect signs?

The brute force way is to have a bunch of people sit at a computer, go through every single image, draw a box around the sign and write down what that sign means. That’s how it’s been done for years. Now the shift is take those resources and instead of labeling everything, label only enough data that we can train a machine to do the same thing.

The most cutting edge research around deep learning algorithms has shown they’re very successful at doing this type of extraction out of imagery, given enough training data. The difference isn’t the algorithms anymore—they’re so many available off the shelf—it’s really the training data as input into those algorithms that determines how successful you are at detecting those features and classifying those. The goal is to iterate as fast as possible with as much data as you can get every time, and you see significant improvements in performance of those algorithms as you teach it all the edge cases.

Whether you’re labeling the data to draw a map manually or to train a machine to draw the map, there’s still a lot of human effort involved. What analytics do you use to optimize the process?

Metrics focused on getting training data are super important because we’re spending thousands of hours labeling this data. It’s very expensive to collect the amounts of data needed to create these really high-performing algorithms. Understanding your process and seeing how small tweaks, whether it’s just a mouse click—reducing one mouse click can have huge effects on how much manual work there is and the throughput of creating training data or building the map. All of these things need to be constantly evaluated.

Let’s talk about the human side of this. How do you motivate someone to label data accurately if the person perceives doing so as a threat to their job?

It’s not about automating people out of a job, it’s about leveraging them differently. We’re actually scaling. Instead of brute forcing the map creation process and having humans look at every single thing, we have humans focus on things that they are really good at and machines can’t replace. That’s the transition. Focus their efforts on labeling edge cases and difficult environments, things that we can’t generalize easily. Have them focus on that 10% that’s really the most difficult and that we might have oversimplified or not even approached.

What’s an example of that 10%?

We didn’t label every single highway information sign that points you to a business down the road or a unique vista…some unique signage that’s only relevant and pertinent to that location in this world, we would not have bothered converting that information into the map.

When we first started, we were focused on what signage is needed to make a navigable map, like speed limits. Now the map is a 3D map that’s useful to machines. Can the machine know with really high confidence where it is in this world? GPS by itself is not good enough. The machine needs to understand the reference map and understand its location in the world by comparing what it sees to the map, and position itself based on that within a meter of accuracy. So, going from twenty meters of potential error to within a meter, to centimeters really is the goal.

What kind of sign would help a machine position itself?

Every single sign. Anything that’s static, anything that you can feel confident will usually be there next time you drive by is a useful data point.

What have been the biggest challenges?

Marshalling a huge amount of human labeling power and having them focus on the things that are hard for machines. That is the core of what I do: identifying what is hard.

So how do you identify what’s hard?

We have a lot of algorithms that do similar things, that detect similar features or overlapping features. By identifying where they might conflict or where a human has labelled one thing and the machine says it’s another, those are the conflict points. Whether it’s machine versus machine or human versus machine, I try to build processes that find those conflict points as efficiently as possible.

How do you measure success?

We’re going into new countries and expanding the scope of the problem, so the finish line is always moving. Let me throw out this other idea that might be useful. We have thousands of people who are experts at making the map. Can we leverage non-expert mapmakers? If you can simplify the things that we need humans to help inform us, then that labor is unlocked and allows us to scale completely differently than we have in the past.

What do you know now that you wish you knew when you started working on this problem?

Ensuring you’re asking the crowd the right questions and doing the right kind of checks that you can be confident in the answers back. Arabic letters on a sign in Saudi Arabia is not something you want to ask a person in Columbia to translate.

What’s the next big thing in Machine Learning?

Finding ways that you don’t have to have as much data, where you can supplement your existing data sets with smaller and smaller amounts of human-labelled data to create the same results. Right now, there’s a focus on getting a lot of high quality data, but because that’s expensive there’s a lot of research happening in how can we synthetically create that data automatically. Can we show that we can create the same level of performance with 1,000 human images and 100,000 synthetic as we would have got if we had a million human-labelled images?

What’s an example of synthetic data creation?

Using video game engines to create a 3D world that looks so much like our real world and has all the weather conditions that you can simulate. You can create a virtual world that is so close to real life but you also enable manipulation of that world in a way that you can’t manipulate the real world. Can we make a snow storm in San Francisco? That is only possible with this virtual environment.

Do you see any other applications for what you’ve been doing with Machine Learning inside or outside of HERE?

Applying feature extraction of signs to other things in the world that are useful for a map. Can you take a stream of data from a sensor and automatically identify things that are useful like available parking or real-time gas prices?

How can developers tap into what HERE is doing with Machine Learning?

We’re starting to use our Open Location Platform (OLP) to provide infrastructure for our internal processes. A lot of the data that we extract and that we will create and put into the HD Live Map, parts of that data stream will be available in OLP and will get information from OLP. That will be another layer of information that we can use to cross-reference our machine learning algorithms, and likewise an external developer could use it to cross-reference their own stuff. [Note: these data processing capabilities will open to external developers in 2018]

What resources would you recommend for someone just starting out with Machine Learning?

Udacity and Khan Academy. They’re not focused on machine learning, but the amount of material that’s within the computer science field there is amazing. For algorithms, published academic research is the cutting edge. There’s this big conference, CVPR [The Conference on Computer Vision and Pattern Recognition] that is one of the premier conferences to stay up to date on research in this field.

You were an anthropology major at Cal. As an anthropologist, what do you think about Machine Learning and automating the world?

It’s going to be really exciting. Our world is going to look so different in twenty or thirty years than it does today because of all this work on automation. It opens up so many possibilities for the way we use our time, the way we use our space, the way we get stuff. Almost every aspect of our lives will be affected by this in some way. It’s hard to think of a revolution beyond the printing press and what that enabled that’s comparable.

Learn more about Oliver and connect with him here.