The following guidelines help optimize performance when building applications that upload and retrieve data from the Data API:
Avoid designing interactions where an application must make multiple calls to the Data API (each of which returns a small amount of data). Instead, combine several related operations into a single request to reduce the number of round trips and resource locking.
Additionally, select lower zoom levels when producing map tiles, strive for higher data / metadata ratio, and apply the adaptive leveling feature of the Data Processing Library.
The Data API is a large distributed system. To help take advantage of its scale, we encourage you to horizontally scale parallel requests to the Data API service endpoints. For high-throughput transfer applications, you should use multiple connections to retrieve or upload data in parallel. If your application issues request directly to Data API using the REST API, we recommend using a pool of HTTP connections and re-using each connection for multiple requests. Avoiding per-request connection setup removes the need to perform TCP slow-start and Secure Sockets Layer (SSL) handshakes on each request.
Do performance profiling and load testing during development, as part of test routines, and before final release to ensure the application performs and scales as required. When optimizing performance, look at network throughput, CPU, and RAM requirements.
Measuring performance is important when you tune the number of requests to issue to the Data API concurrently. Measure the network bandwidth being achieved over single request and the use of other resources that your application uses in processing the data. You can then identify the bottleneck resource (that is, the resource with the highest usage), and hence the number of requests that are likely to be useful. Even a small number of concurrent requests (20 concurrent requests of 50-80 MB/s of desired network throughput) can saturate a 10 Gb/s network interface card (NIC). Going with too low parallelism will result in underutilized resources which are too high in resource congestion.
There are certain situations where an application receives a response from the Data API indicating that a retry is necessary. Responses with HTTP status code 408, 429, 500, 502, 503, and 504 are retriable status codes. If an application generates high request rates, it might receive such responses. If these errors occur, HERE Data SDK for Java & Scala implements the automatic retry logic using exponential back off. If you are not using the HERE Data SDK for Java & Scala, implement a similar retry logic when receiving one of these errors.
The Data API automatically scales in response to sustained new request rates, dynamically optimizing performance. While Data API is internally optimizing for a new request rate, you will temporarily receive HTTP error responses until the optimization completes.
For batch processing it is recommended to use longer retry times or increase maximum number of retries so that intermittent network errors or spikes of HTTP errors will not affect multi-hours batch processing jobs.
For latency-sensitive applications it is advisable to use shorter timeouts and retry slow operations. When you retry a request, we recommend using a new connection to Data API and potentially perform a fresh DNS lookup.
The largest volume of data in an application is often the HTTP responses to client requests generated by the application and passed over the network. Minimizing the response size reduces the load on the network, optimizes storage size, and transfer I/O. Enabling layer compression can considerably reduce response sizes.
You cannot update the compression attribute once the layer is created.
If you are using the HERE Data SDK for Java & Scala to read or write data from a compressed layer in the Data API, compression and decompression are handled automatically.
Some formats, especially textual formats such as text, XML, JSON, and GeoJSON, have very good compression rates. Other data formats are already compressed, such as JPEG or PNG images, so compressing them again with gzip will not result in reduced sizes. Often, compressing them again will increase the size of the payload. For general-purpose binary formats such as protobuf, compression rates depend on the actual content and message size. Layer compression should not be used for Parquet, as it breaks random access to blob data, which is necessary to efficiently read data in Parquet.
Data compression can reduce the volume of data transmitted and minimize transfer time and costs. However, the compression and decompression processes incur overhead. Compression should only be used when there is a demonstrable gain in performance.
Many applications that store data in the Data API work with location-centric or geospatial data, usually serving “hot areas” (city centers, industrial areas, and so on). These hot areas are repeatedly requested by users and are the best candidates for caching. Applications that use caching also send fewer direct requests to the Data API, which can help reduce transfer I/O costs.
Applications working with the Data API should also respect the Cache-Control HTTP Header, which contains directives (instructions) for caching in both requests and responses.
You can improve the upload experience for larger data blobs (50MB+) by using the Data API multipart uploads feature. This feature improves the upload experience by uploading separate parts of a large blob independently, in any order and in parallel.
The Data API supports retrieving data or metadata using the Range HTTP header where appropriate. You can fetch a byte-range from an object, transferring only the specified portion. Using Range HTTP Header allows your application to improve retry times when requests are interrupted.
The HERE Data SDK for Java & Scala provides built-in support for many of the recommended guidelines for optimizing Data API performance.
The HERE Data SDK for Java & Scala provide a simpler API for taking advantage of the Data API from within an application, and is regularly updated to follow the latest best practices. For example, the Data SDK includes logic to automatically retry requests on intermittent networks issues and HTTP 5xx errors also provide functionality which automates horizontal scaling of connections to achieve thousands of requests per second, using byte-range requests where appropriate. It is important to use the latest version of the HERE Data SDK for Java & Scala to obtain the latest performance optimization features.
You can also optimize performance when you are using HTTP REST API requests. When using the REST API, follow the same best practices that are outlined in this section.