Bring your data into HERE platform using an object store layer

Objectives: Understand how to bring your own data into HERE platform using an object store layer

Complexity: Easy

Time to complete: 30 min

Dependencies: Organize your work in projects

Source code: Download

The example in this tutorial demonstrates how you can bring your data in the HERE Platform using the object store layer. The object store layer is a distributed and highly durable key/value store with the additional capability of listing the keys. You can find more information on various layers here: Data Service Documentation

This tutorial includes the following steps:

  1. Creating a catalog with an object store layer
  2. Uploading one file from the local file system to the object store using the OLP Command Line Interface (CLI)
  3. Uploading multiple files from the local file system to the object store layer using Apache Hadoop

In preparation, you will need to create a catalog containing an object store layer.

Create catalog

You will need to create a catalog. You can perform these steps by following the steps outlined in the Organize your work in projects, using the OLP Command Line Interface (CLI).

Create a file called bring-your-data.json with the contents below, replacing {{YOUR_CATALOG_ID}} with an identifier of your choice.

{
  "id": "{{YOUR_CATALOG_ID}}",
  "name": "Tutorial for copying your data from local file system to the Object Store layer",
  "summary": "Tutorial for copying your data from local file system to the Object Store layer",
  "description": "Tutorial for copying your data from local file system to the Object Store layer",
  "tags": ["Hadoop FS Support", "Object store"],
  "layers": [
    {
      "id": "bring-your-data-layer",
      "name": "bring-your-data-layer",
      "summary": "Simulated data.",
      "description": "Simulated data to demonstrate usability of Object store layer",
      "tags": ["Hadoop FS Support", "Object store"],
      "layerType": "objectstore",
      "volume": {
        "volumeType": "durable"
      }
    }
  ]
}

Replace {{YOUR_CATALOG_ID}} below with your own identifier and then run the following command:

olp catalog create {{YOUR_CATALOG_ID}} \
    "Tutorial for copying data from local file system to the object store layer" \
    --config bring-your-data.json

Upload single file to the object store layer using OLP CLI

You can upload single file from your local file system to the object store layer using the OLP Command Line Interface (CLI).

1. Upload file

Replace {{YOUR_CATALOG_HRN}} in the following command from the value of HRN which you received from create catalog step, and then run the following command:

olp catalog layer object put {{YOUR_CATALOG_HRN}} bring-your-data-layer --key test-file --data test-data-cli/test-file

The above command will upload your local file test-data-cli/test-file to the object store layer.

2. List uploaded file

You can verify your upload by listing files in the layer, using following command from OLP CLI, after replacing with the HRN you received from create catalog step in the following command.

olp catalog layer object list {{YOUR_CATALOG_HRN}} bring-your-data-layer

3. Get contents of uploaded file

In order to get the data from the above file in object store layer you can replace {{YOUR_CATALOG_HRN}} in the following command from the value of HRN which you received from create catalog step and then run the following command:

olp catalog layer object get {{YOUR_CATALOG_HRN}} bring-your-data-layer --key test-file --data test-data-cli/test-file

Upload multiple files to the object store layer using Apache Hadoop

You can upload multiple files from your local file system to the object store in parallel in a distributed manner using Apache Hadoop.

You will need to export JAVA_HOME variable in your environment before running Apache Hadoop commands.

You can follow the below steps to upload multiple files using Apache Hadoop:

1. Export HADOOP_VERSION variable

You will need to export HADOOP_VERSION variable in your environment, for this tutorial we run the following command to export HADOOP_VERSION:


export HADOOP_VERSION=2.7.3

2. Download Apache Hadoop

You will need to download Apache Hadoop. You can run following command to download Apache Hadoop with appropriate version:

wget -c https://archive.apache.org/dist/hadoop/common/hadoop-${HADOOP_VERSION}/hadoop-${HADOOP_VERSION}.tar.gz

3. Extract Apache Hadoop

You will need to extract the tarball downloaded from previous step. You can run the following command:

tar xzf hadoop-${HADOOP_VERSION}.tar.gz

4. Download Hadoop FS Support Jar

You will need to download Hadoop FS Support assembly jar provided by the Data Client Library. You can run the following command:

mvn dependency:copy -Dartifact=com.here.platform.data.client:hadoop-fs-support_2.11:LATEST:jar:assembly -DoutputDirectory=hadoop-${HADOOP_VERSION}/share/hadoop/common/lib/

5. Upload data using Apache Hadoop

You will need to replace with the catalog HRN you received from the create catalog step above in following script and run it as below

./hadoop-${HADOOP_VERSION}/bin/hadoop distcp test-data-hadoop blobfs://{{YOUR_CATALOG_HRN}}:bring-your-data-layer/test-data-hadoop

The above command will upload directory test-data-hadoop containing three files test-file-1, test-file-2 and test-file-3 to the Object store layer using Apache Hadoop.

6. List the uploaded files using Apache Hadoop

You can verify the upload from above command by listing the files that were uploaded using Apache Hadoop by using the following command, after replacing with the HRN you received from the create catalog step in following script and run it as below

./hadoop-${HADOOP_VERSION}/bin/hadoop fs -ls blobfs://{{YOUR_CATALOG_HRN}}:bring-your-data-layer/test-data-hadoop/

Above command will list all the files in the directory test-data-hadoop from the Object store layer.

Copying data from other storages to the object store layer

You can copy your data from any Apache Hadoop compatible storage like AWS S3 and Azure Blob Storage to the object store layer using the steps mentioned in above. You will need to change the source path from local file system to a remote storage.

You can check more detail on how to use Apache Hadoop to copy data from one storage to another here: DistCp and Object Stores

Further Information

For more details on the topics covered in this tutorial, you can refer to the following sources:

results matching ""

    No results matching ""