Local Spark (Linux/MacOS)
In the local Spark approach, all Spark jobs are executed on the user's machine.
The Sparkmagic extension uses Livy to execute all user code. Livy uses the local installation of Spark on the user's machine.
You must install the following software tools in the specified version:
- Java 8+
- Hadoop 2.7.3
- Spark 2.4.0
- Livy 0.5.0-incubating
Follow our example installation steps described in the Installation Steps for Spark Tools guide.
It is necessary for Hadoop, Spark and Livy to communicate with each other. Use the following configuration:
Specify the Hadoop installation folder by adding the following environment variables to the
~/.bashrc file if you are using Linux, or
~/.bash_profile for MacOS:
Load the environment variables, for Linux execute:
For MacOS execute:
Download the following dependencies:
curl -o "/tmp/scala-java8-compat_2.11-0.8.0.jar" \
curl -o "/tmp/json4s-native_2.11-3.5.3.jar" \
curl -o "/tmp/protobuf-java-3.10.0.jar" \
Copy them inside the Spark installation folder:
sudo cp "/tmp/scala-java8-compat_2.11-0.8.0.jar" "/usr/local/spark/jars/"
sudo cp "/tmp/json4s-native_2.11-3.5.3.jar" "/usr/local/spark/jars/"
sudo cp "/tmp/protobuf-java-3.10.0.jar" "/usr/local/spark/jars/"
Specify the Spark installation folder by adding the following environment variables to the
~/.bashrc file if you are using linux, or
~/.bash_profile for MacOS:
Load the environment variables. For Linux, execute:
For MacOS, execute:
Locate the folder where the olp-sdk-for-python-1.12-env conda environment is installed running the command
conda env list:
$ conda env list
# conda environments:
olp-sdk-for-python-1.12-env * /home/user/miniconda3/envs/olp-sdk-for-python-1.12-env
In the above example, the folder of the environment is
/home/user/miniconda3/envs/olp-sdk-for-python-1.12-env. It is necessary to configure Spark to point to the python binary located inside this folder location. For this, create the Spark environment file:
sudo cp spark-env.sh.template spark-env.sh
sudo vi spark-env.sh
Add the following environment variables with the actual location of the olp-sdk-for-python-1.12-env conda environment plus the suffix
/bin/python3 to the previously created
In the above example case the environment path was
/home/user/miniconda3/envs/olp-sdk-for-python-1.12-env, verify what is your own environment path before adding the corresponding values.
Also note that we add the suffix /bin/python3 to the previous path to point to the right python binary.
Configure Livy connection timeout:
echo "livy.rsc.server.connect.timeout: 1800s" > ~/livy/conf/livy-client.conf
cp ~/livy/conf/log4j.properties.template ~/livy/conf/log4j.properties
- For these steps to work properly, it is necessary to not be connected into the HERE VPN, due to a known issue that misconfigures the Livy server with a wrong URL. Validate before starting the Livy server that you are not connected to HERE VPN.
All you need to execute Spark jobs using Sparkmagic extension is now configured.
You can start the Livy server using this command:
Livy server runs by default on
localhost:8998. You can stop it by running:
The tutorial notebooks for Spark are located in the folder:
You can start with the Getting Started notebook located at
$HOME/olp-sdk-for-python-1.12/tutorial-notebooks/GettingStarted.ipynb to get an overview of all tutorial notebooks.
Following are steps to check that your local Spark environment is properly configured.
Start Jupyter and Livy services
Start the Livy server using this command (assuming that your Livy installation is in
Activate the SDK conda environment:
conda activate olp-sdk-for-python-1.12-env
Go to home directory and proceed to start Jupyter:
jupyter notebook --NotebookApp.iopub_data_rate_limit=1000000000 --ip=0.0.0.0
Execute the Health Check notebook
Open the tutorial notebook
$HOME/olp-sdk-for-python-1.12/tutorial-notebooks/spark/spark_ProcessDataLocally_scala.ipynb and execute all its paragraphs.
If all the paragraphs run successfully, your local spark environment is properly configured.
Thank you for choosing the HERE Data SDK for Python. After the setup, kindly consider filling out this short 1-minute survey to help us improve the setup experience.