Hands on

Adding Alexa Voice Control to HERE Maps

By Michael Palermo | 19 January 2018

The Developer Relations team at HERE had a busy week at CES, with a positive outcome noted for developers. Among the many offerings we had to display, I was proud to have produced a demo showcasing voice control over our HERE Maps API using Amazon Alexa. As attendees approached, we would ask where they were from, and instruct Alexa via an Amazon Echo to show on the map their city or country. We also used voice commands to zoom in and out of the map.

When developers approached, we often talked about the technology that made the demonstration possible. With the number of questions asked and interest shown, I promised to produce a technical blog to put it all in perspective. This post will 'speak' to the major components of the demo and how each connect to one another. To fully benefit from this post, you will need the following:

This post will assume you have a general understanding of the developer resources available from Amazon, and a degree of experience developing skills for Amazon Alexa. For more information, please visit Getting Started with the Alexa Skills Kit.

Defining the Alexa Voice Interface

Developers create skills to extend capabilities of Amazon Alexa. An important aspect of making a skill is defining the voice interface The interface consists of sample utterances, intents, and slot types. To put this quickly in perspective, consider the following sample utterances used in the context of the demo featured at CES and the focus of this post:

"Show me Phoenix, Arizona "
"Go to Palermo, Sicily "
"Display Las Vegas, Nevada "

In each utterance above, the verb highlighted in bold represents the intent of the user. The location highlighted with an underline is the slot of data we are acquiring from the user. In this demo, I defined ShowPlace and Zoom as intents, as shown in the interactive model designer.

blog_alexa_interactivemodel

Users of the skill may have a variety of ways to ask how to 'show' a location on the map. Likewise, there may be a few ways to express zooming behavior. Fortunately, the voice interface supports synonyms to represent these variations as seen in the 'types' section of the interactive model schema:

The types defined above can now be used in sample utterances as shown in the 'intents' section of the schema:

To understand the above from a developer's perspective, consider lines 3-10 in the schema above. Essentially, we want to map this somewhere to somewhere in code to a function like showPlace(place,region). While there is not a direct mapping, you will see how this is accomplished soon. Likewise, for lines 28-31, a functional code equivalent could be setZoom(behavior), where behavior indicates 'in' or 'out'. Once the design of the skill is complete, it is time to code the skill.

Coding the Alexa Skill

Amazon allows custom skill code to reside where you prefer. In this example, I chose to host the code for the skill in an AWS Lambda. The lambda used in this post is connected to the following resources:

blog_lambda_designer_heremaps

The Alexa Skills Kit is used as a trigger to call this lambda. Amazon CloudWatch logs is a great resource for logging and troubleshooting. AWS IoT is used to provide the communication channel between the skill code and the client code it will interact with. Now look at the starting code of the lambda:

Let's first draw attention to the postMQTT function declared at lines 11-24. This function will be used later in the lambda to post data retrieved from the voice interaction to any subscribing clients. In my demo, the subscribing client was a simple HTML page, which we will examine a little later. The function takes two parameters, 'place' and 'zoom'. Line 14 shows how these values are concatenated into a value for 'payload'. On line 13, notice the value of 'topic' is 'location-point'. This is a resource previously created in AWS IoT, and is associated with an endpoint required at the end of line 5. The value on line 5 of " your endpoint " will need to be replaced by an endpoint provided after creating a thing> in AWS IoT. In this post, the thing created is called 'location-point' (used in line 13 in code above) as shown here:

blog_locationpoint_awsiot

Creating a thing is simple. All I did was give it a name, 'location-point'. Once it is created, you can determine your endpoint by navigating to the test section and selecting the 'View endpoint' option from the small dropdown arrow in the upper right part of the screen as shown here:

blog_awsiot_viewendpoint

Once you have your endpoint, replace " your endpoint " at the end of line 5 in the preceding code with that value. Now let's look at the function in the lambda that is invoked when the user's intent is to show a place on the map:

Lines 3-5 collect the data from the user from the 'intent' parameter passed in by the Alexa service. Line 5 shows that the 'Region' value is optional, thus make it an empty string if it is not provided by user. After a series of validations, the postMQTT function is invoked with the place value. Let's also look at the function for managing zoom requests:

The function above is very similar to the previous function. It obtains the values provided by the user, performs validations, and then invokes the postMQTT function on line 15. One interesting difference is noted on line 4 - the use of the slot ID. Previously in this post when examining the 'types' section of the interactive model schema, note the IDs provided for the two ZoomType values: PLUS and MINUS. This is handy when I want to know the behavior in code, not necessarily the word spoken by the user. For example, whether the user says "Zoom in", "Zoom down", or "Zoom closer", the value of the ID in each case is PLUS. This is what is passed to the subscribing MQTT client.

Coding the Client

As mentioned before, the client application of the demo is an HTML page. It is configured to use the HERE Maps API and is a subscriber to the AWS IoT thing called 'location-point' via MQTT protocol. Here is the relevant code used by the web page, with inline comments:

Line 1 is for your benefit in reading the code as the variables were declared elsewhere but used above. Of significance to what we have seen so far is the function declared on lines 77-93. Notice the message parameter contains a payloadString property. This contains the values sent from the lambda associated with the Alexa skill.

Most of the code is self describing, but let's examine two lines in particular. Line 34 is referring to an endpoint - the same endpoint value referenced earlier in the post. So you will need to replace that value with yours. Also, line 7 shows the value of " your identity pool id " that will also need to be replaced. This is obtained from Amazon Cognito, a resource available once you are logged into AWS with developer credentials. You can obtain the pool id as shown below:

blog_aws_cognito

You can either get the identity pool id by selecting the 'Edit identity pool' link or take it from the Get AWS Credentials code sample on the screen.

Summary

In this post, we showed how it is possible to add voice control to allow a different way to interact with HERE Maps. We examined briefly how to construct the voice interface for an Alexa Skill, how to manage functionality in the associated lambda, and how to wire up the lambda and client to communicate via AWS IoT. Happy coding!