Objectives: Understand how to bring your own data into HERE platform using an object store layer
Complexity: Easy
Time to complete: 30 min
Dependencies: Organize your work in projects
Source code: Download
The example in this tutorial demonstrates how you can bring your data in the HERE Platform using the object store layer. The object store layer is a distributed and highly durable key/value store with the additional capability of listing the keys. You can find more information on various layers here: Data Service Documentation
This tutorial includes the following steps:
- Creating a catalog with an object store layer
- Uploading one file from the local file system to the object store using the OLP Command Line Interface (CLI)
- Uploading multiple files from the local file system to the object store layer using Apache Hadoop
In preparation, you will need to create a catalog containing an object store layer.
Create catalog
You will need to create a catalog. You can perform these steps by following the steps outlined in the Organize your work in projects, using the OLP Command Line Interface (CLI).
Create a file called bring-your-data.json
with the contents below, replacing {{YOUR_CATALOG_ID}}
with an identifier of your choice.
{
"id": "{{YOUR_CATALOG_ID}}",
"name": "Tutorial for copying your data from local file system to the Object Store layer",
"summary": "Tutorial for copying your data from local file system to the Object Store layer",
"description": "Tutorial for copying your data from local file system to the Object Store layer",
"tags": ["Hadoop FS Support", "Object store"],
"layers": [
{
"id": "bring-your-data-layer",
"name": "bring-your-data-layer",
"summary": "Simulated data.",
"description": "Simulated data to demonstrate usability of Object store layer",
"tags": ["Hadoop FS Support", "Object store"],
"layerType": "objectstore",
"volume": {
"volumeType": "durable"
}
}
]
}
Replace {{YOUR_CATALOG_ID}}
below with your own identifier and then run the following command:
olp catalog create {{YOUR_CATALOG_ID}} \
"Tutorial for copying data from local file system to the object store layer" \
--config bring-your-data.json
Upload single file to the object store layer using OLP CLI
You can upload single file from your local file system to the object store layer using the OLP Command Line Interface (CLI). Replace {{YOUR_CATALOG_HRN}}
in the following command from the value of HRN which you received from create catalog step, and then run the following command:
olp catalog layer object put {{YOUR_CATALOG_HRN}} bring-your-data-layer --key test-file --data test-data-cli/test-file
The above command will upload your local file test-data-cli/test-file
to the object store layer. In order to get the data from the above file in object store layer you can replace {{YOUR_CATALOG_HRN}}
in the following command from the value of HRN which you received from create catalog step and then run the following command:
olp catalog layer object get {{YOUR_CATALOG_HRN}} bring-your-data-layer --key test-file --data test-data-cli/test-file
Upload multiple files to the object store layer using Apache Hadoop
You can upload multiple files from your local file system to the object store in parallel in a distributed manner using Apache Hadoop.
You will need to export JAVA_HOME
variable in your environment before running Apache Hadoop commands.
You can follow the below steps to upload multiple files using Apache Hadoop:
1. Export HADOOP_VERSION variable
You will need to export HADOOP_VERSION
variable in your environment, for this tutorial we run the following command to export HADOOP_VERSION
:
export HADOOP_VERSION=2.7.3
2. Download Apache Hadoop
You will need to download Apache Hadoop. You can run following command to download Apache Hadoop with appropriate version:
wget -c https://archive.apache.org/dist/hadoop/common/hadoop-${HADOOP_VERSION}/hadoop-${HADOOP_VERSION}.tar.gz
You will need to extract the tarball downloaded from previous step. You can run the following command:
tar xzf hadoop-${HADOOP_VERSION}.tar.gz
4. Download Hadoop FS Support Jar
You will need to download Hadoop FS Support assembly jar provided by the Data Client Library. You can run the following command:
mvn dependency:copy -Dartifact=com.here.platform.data.client:hadoop-fs-support_2.11:LATEST:jar:assembly -DoutputDirectory=hadoop-${HADOOP_VERSION}/share/hadoop/common/lib/
5. Upload data using Apache Hadoop
You will need to replace with the catalog HRN you received from the create catalog step above in following script and run it as below
./hadoop-${HADOOP_VERSION}/bin/hadoop distcp test-data-hadoop blobfs://{{YOUR_CATALOG_HRN}}:bring-your-data-layer/test-data-hadoop
You can list the files that were uploaded using Apache Hadoop by using the following command, after replacing with the HRN you received from the create catalog step in following script and run it as below
./hadoop-${HADOOP_VERSION}/bin/hadoop fs -ls blobfs://{{YOUR_CATALOG_HRN}}:bring-your-data-layer/test-data-hadoop/
Copying data from other storages to the object store layer
You can copy your data from any Apache Hadoop compatible storage like AWS S3 and Azure Blob Storage to the object store layer using the steps mentioned in above. You will need to change the source path from local file system to a remote storage.
You can check more detail on how to use Apache Hadoop to copy data from one storage to another here: DistCp and Object Stores
List file using OLP CLI
You can also list the same files that were copied before, using OLP CLI using following command, after replacing with the HRN you received from create catalog step in the following command.
olp catalog layer object list {{YOUR_CATALOG_HRN}} bring-your-data-layer
For more details on the topics covered in this tutorial, you can refer to the following sources: