CSV to SDII-Sensoris Converter allows users to create SDII (MessageList) or SENSORIS data from csv probe data.
The CSV to SDII-Sensoris Converter operates as a batch pipeline which monitors an input versioned layer and runs on changes. Output SDII or Sensoris data will be stored in your choice of either a Versioned or Index layer.
See separate README file in the utils folder for details uploading csv file into input versioned layer.
While running the Wizard Deployer, you will be asked to provide the following information. These values are needed in order to properly configure and deploy the pipeline. You should have this information on hand prior to executing the Wizard script.
The Input CSV data must contain minimum 4 fields. The required fields are ID, latitude, longitude and timestamp.
Additionally, you will need to modify
dat.config file to fit the csv data configuration, by specifying field mapping to column index and delimiter.
# Your input data can contain any number of columns in any order, but must somewhere # have the below required elements. Add integer value (assuming first column = 0) after # the '=' in each row below to specify column in which that element resides. For dat.delim # you must choose one of the following standard delimiter choices (enter the word, not the character): # comma, tab, space, pipe (|), percent (%), equals (=), pound (#), bang (!), ampersand (&), colon (:), semicolon (;), custom (single character, enter below) # NOTE: repeated delimiter characters in your data will be treated as a separate delimiters dat.delim=comma # NOTE: if you have used dat.delim=custom, uncomment the property (dat.char) below and enter the character that is used as a delimiter #dat.char= dat.id=0 dat.latitude=1 dat.longitude=2 dat.timestamp=3 # If your data includes heading information, uncomment the property (dat.heading) below and provide the column number #dat.heading=4 # If your data includes speed information, uncomment the property (dat.speed) below and provide the column number #dat.speed=5
For delimiter option you should specify its name, e.g.
dat.delim=comma to use actual comma character as delimiter, or any other from this list:
comma, tab, space, pipe (|), percent (%), equals (=), pound (#), bang (!), ampersand (&), colon (:), semicolon (;).
Application also allows user to specify a custom delimiter, using option
dat.delim=custom and specifying actual delimiter character via
+ here is an example of actual delimiter).
Prior to running the Wizard script, you will need to set up your input versioned layer and also put thought into where you will store your input csv files with probe data.
A dedicated versioned layer is needed before deploying your pipeline template. See the Data User Guide for details on creating and configuring catalogs and layers. Make sure to share your catalog with the same Group you plan to share the conversion pipeline with.
In order to process csv file, it must first reside in a Versioned layer in the platform and be accessible by the pipeline, i.e. shared with the same Group.
You are now ready to deploy your pipeline using the Wizard Deployer. Make sure that you have followed the Wizard installation and configuration instructions.
This pipeline template requires a dedicated output catalog and layer, which can be created manually or via the Wizard. In the case of creating the output layer manually, make sure you select the corresponding configuration parameters in the Wizard's answers.
Layer schema should be either SDII MessageList (
hrn:here:schema::olp-here:com.here.sdii:sdii_message_list_v3:4.0.1) or SENSORIS (
Layer tile level should match tile level, you will specify in the Wizard in order to be able to visualize the data.
If the output layer is an index layer, both the timewindow and Heretile attributes must be configured when creating the layer. Also, the content type should be either protobuf or parquet.
Input and output catalogs should be different.
In Platform Portal select the Pipelines tab where you should be able to see your pipeline deployed and running. After your pipeline finishes and your data is published, then you can find your output catalog under the Data tab and inspect your data visually or query/retrieve your data programmatically using one of the following options:
For given input data, the following table provides processing times and limitations observed for this pipeline. Use this information to estimate the number of workers and cores per worker you will need based on the characteristics of your input data.
|Input data size||Number of workers||Cores per worker||Processing time|
|500 MB||5||2||~ 12 min.|
|1 GB||5||3||~ 20 min.|
|5 GB||5||3||~ 22 min.|
Running this pipeline template will incur platform usage costs associated with:
The options you choose when configuring the pipeline and your usage patterns will determine the overall expense. Below are some tips for cost-effective usage:
If you need support with this pipeline template, please contact us.