Batch Processing

The HERE platform supports Apache Spark framework for running batch pipelines. We offer two different modules to do that, Spark Connector and Spark Support.

Note

HERE strongly suggests to use Spark Connector whenever possible as this allows to make use of the full power of Apache Spark framework.

Spark Connector

Spark Connector implements the standard Spark interfaces that allows you to read from a catalog and get a data set as a DataFrame[Row] and write a DataFrame to a catalog.

As a result, you can use all standard Spark APIs and functions like select, filter, map, collect etc. to work with the data.

This means, your business logic does not need to contain any HERE-specific function calls.

For detailed explanation of Spark Connector, see Spark Connector.

Spark Support

Spark Support is a HERE-proprietary implementation of using data from HERE platform catalogs and layers with Apache Spark framework. The distribution of processing jobs to workers is done by Spark but the data model is a HERE-proprietary format. There is no SQL-like interface, so you cannot select and filter data using RSQL query.

There are only very few specific use-cases when this module should be used e.g. when you need to have full control over the data format because you want to use a very compact format. This principle is used e.g. by Data Processing Library. If you want to implement batch pipelines which can be optimized for maximum performance we suggest to use Data Processing Library.

There are plans but no defined dates yet to retire Spark Support module in long term.

results matching ""

    No results matching ""