FAQ
Are there Tutorial Notebooks for Data SDK for Python?
Yes. After installing the SDK, you can find the getting started tutorial notebook at
For Linux/MacOS: $HOME/olp-sdk-for-python-1.12/tutorial-notebooks/GettingStarted.ipynb
.
For Windows: %USERPROFILE%\olp-sdk-for-python-1.12\tutorial-notebooks\GettingStarted.ipynb
.
Where do I specify maven dependencies?
Specify your Maven dependencies in the below file
For Linux/MacOS: $HOME/.sparkmagic/config.json
For Windows: %USERPROFILE%\.sparkmagic\config.json
Dependencies
Put the dependencies in the json field session_configs -> conf -> spark.jars.packages
, using the format "group:artifact:[classifier:]version"
and separate each other with commas, for example: "org.apache.spark:spark-core_2.12:2.4.1,org.apache.spark:spark-sql_2.12:jar:2.4.1"
.
Exclusions
For the exclusions use the json field session_configs -> conf -> spark.jars.excludes
, using the format "group:artifact"
and separating each other with commas, for example: "org.apache.spark:spark-*,com.fasterxml.jackson.core:jackson-databind"
.
Can I switch the notebook execution between Spark local and EMR Spark cluster?
Follow the same approach used for EMR tutorial notebooks at $HOME/olp-sdk-for-python-1.12/tutorial-notebooks/emr/emr_ProcessDataRemotely_pySpark.ipynb
or $HOME/olp-sdk-for-python-1.12/tutorial-notebooks/emr/emr_ProcessDataRemotely_scala.ipynb
From a Python 3
notebook, set the configuration under the %%spark config
magic to indicate where is the ivy.settings.xml
file.
-
For EMR Spark the property should be: "spark.jars.ivySettings": "/var/lib/spark/.here/ivy.settings.xml",
-
For local Spark, for example: "spark.jars.ivySettings": "/home/cesar/.here/ivy.settings.xml",
Replace /home/cesar/
with your home directory, as ivy.settings.xml
is in your home directory at /.here/ivy.settings.xml
. Provide the explicit file path, it can not contains $HOME
or ~
to indicate your home directory.
Use this magic %spark add -s scala-spark -l scala -u <PUT YOUR LIVY ENDPOINT HERE> -k
to indicate where to find the Livy server.
- For Livy running on local:
%spark add -s pyspark -l python -u http://localhost:8998 -k
- For Livy running on EMR, e.g.: it should be something like :
%spark add -s pyspark -l python -u http://ec2-3-16-25-189.us-east-2.compute.amazonaws.com:8998 -k
Restart the kernel in your notebook between changing to force a fresh Livy session.