FAQ

Are there Tutorial Notebooks for Data SDK for Python?

Yes. After installing the SDK, you can find the getting started tutorial notebook at

For Linux/MacOS: $HOME/olp-sdk-for-python-1.12/tutorial-notebooks/GettingStarted.ipynb.

For Windows: %USERPROFILE%\olp-sdk-for-python-1.12\tutorial-notebooks\GettingStarted.ipynb.

Where do I specify maven dependencies?

Specify your Maven dependencies in the below file

For Linux/MacOS: $HOME/.sparkmagic/config.json

For Windows: %USERPROFILE%\.sparkmagic\config.json

Dependencies

Put the dependencies in the json field session_configs -> conf -> spark.jars.packages, using the format "group:artifact:[classifier:]version" and separate each other with commas, for example: "org.apache.spark:spark-core_2.12:2.4.1,org.apache.spark:spark-sql_2.12:jar:2.4.1".

Exclusions

For the exclusions use the json field session_configs -> conf -> spark.jars.excludes, using the format "group:artifact" and separating each other with commas, for example: "org.apache.spark:spark-*,com.fasterxml.jackson.core:jackson-databind".

Can I switch the notebook execution between Spark local and EMR Spark cluster?

Follow the same approach used for EMR tutorial notebooks at $HOME/olp-sdk-for-python-1.12/tutorial-notebooks/emr/emr_ProcessDataRemotely_pySpark.ipynb or $HOME/olp-sdk-for-python-1.12/tutorial-notebooks/emr/emr_ProcessDataRemotely_scala.ipynb

From a Python 3 notebook, set the configuration under the %%spark config magic to indicate where is the ivy.settings.xml file.

  • For EMR Spark the property should be: "spark.jars.ivySettings": "/var/lib/spark/.here/ivy.settings.xml",

  • For local Spark, for example: "spark.jars.ivySettings": "/home/cesar/.here/ivy.settings.xml",
    Replace /home/cesar/ with your home directory, as ivy.settings.xml is in your home directory at /.here/ivy.settings.xml. Provide the explicit file path, it can not contains $HOME or ~ to indicate your home directory.

Use this magic %spark add -s scala-spark -l scala -u <PUT YOUR LIVY ENDPOINT HERE> -k to indicate where to find the Livy server.

  • For Livy running on local: %spark add -s pyspark -l python -u http://localhost:8998 -k
  • For Livy running on EMR, e.g.: it should be something like : %spark add -s pyspark -l python -u http://ec2-3-16-25-189.us-east-2.compute.amazonaws.com:8998 -k

Restart the kernel in your notebook between changing to force a fresh Livy session.

results matching ""

    No results matching ""