CSV to SDII-Sensoris Converter

Overview

CSV to SDII-Sensoris Converter allows users to create SDII (MessageList) or SENSORIS data from csv probe data.

Structure

The CSV to SDII-Sensoris Converter operates as a batch pipeline which monitors an input versioned layer and runs on changes. Output SDII or Sensoris data will be stored in your choice of either a Versioned or Index layer.

Application Flow Diagram
Figure 1. Application Flow Diagram

Legend: Legend Diagram

See separate README file in the utils folder for details uploading csv file into input versioned layer.

Prerequisites

While running the Wizard Deployer, you will be asked to provide the following information. These values are needed in order to properly configure and deploy the pipeline. You should have this information on hand prior to executing the Wizard script.

  • Group you would like the pipeline to be shared with. Make sure that all input and output catalogs are also shared with this Group.
  • Pipeline prefix - your deployed pipeline will begin with whatever string you enter for prefix
  • Input catalog and layer
  • Output catalog and layer
  • Output tile level
  • Date format and/or date pattern
  • Speed units
  • Schema
  • Submitter
  • Version to use
  • Number of worker (parallel processes)
  • Cores per worker (CPU/memory allocated to each process)

The Input CSV data must contain minimum 6 fields. The required fields are ID, latitude, longitude and timestamp.

Additionally, you will need to modify dat.config file to fit the csv data configuration, by specifying field mapping to column index and delimiter.

e.g.

# Your input data can contain any number of columns in any order, but must somewhere
# have the below required elements. Add integer value (assuming first column = 0) after
# the '=' in each row below to specify column in which that element resides. For dat.delim
# you must choose one of the following standard delimiter choices (enter the word, not the character):
# comma, tab, space, pipe (|), percent (%), equals (=), pound (#), bang (!), ampersand (&), colon (:), semicolon (;), custom (single character, enter below)
# NOTE: repeated delimiter characters in your data will be treated as a separate delimiters
dat.delim=comma
# NOTE: if you have used dat.delim=custom, enter the character below
dat.char=+
dat.id=0
dat.latitude=1
dat.longitude=2
dat.timestamp=3
# If your data includes speed information, give column number below
dat.speed=4
# If your data includes heading information, give column number below
dat.heading=5

For delimiter option you should specify its name, e.g. dat.delim=comma to use actual comma character as delimiter, or any other from this list: comma, tab, space, pipe (|), percent (%), equals (=), pound (#), bang (!), ampersand (&), colon (:), semicolon (;).

Application also allows user to specify a custom delimiter, using option dat.delim=custom and specifying actual delimiter character via dat.char=+ (+ here is an example of actual delimiter).

Execution

Running on the Platform

Prior to running the Wizard script, you will need to set up your input versioned layer and also put thought into where you will store your input csv files with probe data.

Set Up Input Layer

A dedicated versioned layer is needed before deploying your pipeline template. See the Data User Guide for details on creating and configuring catalogs and layers. Make sure to share your catalog with the same Group you plan to share the conversion pipeline with.

In order to process csv file, it must first reside in a Versioned layer in the platform and be accessible by the pipeline, i.e. shared with the same Group.

You are now ready to deploy your pipeline using the Wizard Deployer. Make sure that you have followed the Wizard installation and configuration instructions.

Output Catalog

This pipeline template requires a dedicated output catalog and layer, which can be created manually or via the Wizard. In the case of creating the output layer manually, make sure you select the corresponding configuration parameters in the Wizard's answers.

Layer schema should be either SDII MessageList (hrn:here:schema::olp-here:com.here.sdii:sdii_message_list_v3:4.0.1) or SENSORIS (hrn:here:schema::olp-here:org.sensoris:sensoris-specification_v1_0_0:1.0.0)

Layer tile level should match tile level, you will specify in the Wizard in order to be able to visualize the data.

If the output layer is an index layer, both the timewindow and Heretile attributes must be configured when creating the layer. Also, the content type should be either protobuf or parquet.

Input and output catalogs should be different.

Verification

In Platform Portal select the Pipelines tab where you should be able to see your pipeline deployed and running. After your pipeline finishes and your data is published, then you can find your output catalog under the Data tab and inspect your data visually or query/retrieve your data programmatically using one of the following options:

Scalability and Processing Time

For given input data, the following table provides processing times and limitations observed for this pipeline. Use this information to estimate the number of workers and cores per worker you will need based on the characteristics of your input data.

Input data size Number of workers Cores per worker Processing time
500 MB 5 2 ~ 12 min.
1 GB 5 3 ~ 20 min.
5 GB 5 3 ~ 22 min.

Cost Estimation

Running this pipeline template will incur platform usage costs associated with:

  • Storage
  • Data IO
  • Compute Core
  • Compute RAM
  • Log Search IO

The options you choose when configuring the pipeline and your usage patterns will determine the overall expense. Below are some tips for cost-effective usage:

  • Review the Data User Guide information regarding storage costs to make the appropriate choices for your input/output layers
  • Monitor your Usage Metrics and adjust (re-deploy) your pipeline if warranted

Support

If you need support with this pipeline template, please contact us.

results matching ""

    No results matching ""