Sparkmagic Extension (Linux/MacOS)

Sparkmagic is a set of tools for interactively working with remote Spark clusters through Livy, a Spark REST server, in Jupyter notebooks. The Sparkmagic project includes a set of magics for interactively running Spark code in multiple languages, as well as some kernels that you can use to turn Jupyter into an integrated Spark environment.

Installation

Activate the SDK conda environment:

conda activate olp-sdk-for-python-1.12-env

Generate the default configuration for Jupyter:

jupyter notebook --generate-config

Upgrade pip and install the setuptools module:

pip install --upgrade pip
pip install --upgrade --ignore-installed setuptools

Install the sparkmagic extension:

pip install sparkmagic ipyleaflet geomet

Configure the Sparkmagic extension on Jupyter:

jupyter nbextension enable --py widgetsnbextension
jupyter nbextension enable --py --sys-prefix ipyleaflet
jupyter-kernelspec install $(pip show sparkmagic | grep Location | cut -d" " -f2)/sparkmagic/kernels/sparkkernel --user
jupyter-kernelspec install $(pip show sparkmagic | grep Location | cut -d" " -f2)/sparkmagic/kernels/pysparkkernel --user
jupyter-kernelspec install $(pip show sparkmagic | grep Location | cut -d" " -f2)/sparkmagic/kernels/sparkrkernel --user
jupyter serverextension enable --py sparkmagic

Execute the following commands:

sed -i -e 's/return self._pyspark_command(sql_context_variable_name)/return self._pyspark_command(sql_context_variable_name, False)/g' $(pip show sparkmagic | grep Location | cut -d" " -f2)/sparkmagic/livyclientlib/sqlquery.py

rm -rf $(pip show sparkmagic | grep Location | cut -d" " -f2)/sparkmagic/livyclientlib/sqlquery.py-e

Note

The two commands above are a temporary solution for a known bug that affects sparkmagic integration with pyspark using python3. The solution by the community is in progress and is expected to be delivered in the next release.

Configuration

Create the Sparkmagic configuration directory:

mkdir -p ~/.sparkmagic

Download the spark configuration files using this link.

Unzip the downloaded file, and open a terminal in the unziped folder:

unzip spark-conf-files.zip
cd spark-conf-files/

The sparkmagic configuration file includes Data SDK jars for version 2.11.7. The latest version of Data SDK jars can be identified using this link in the Include BOMs sub-section. To obtain the latest Data SDK jars, execute the script config_file_updater.py using the following commands:

python config_file_updater.py --version <version_to_upgrade_to>

Copy the sparkmagic configuration file into ~/.sparkmagic:

cp config.json ~/.sparkmagic/config.json

To copy your home directory path inside the sparkmagic configuration file, run these commands:

sed -i -e "s|\${HOME}|$HOME|g" ~/.sparkmagic/config.json
rm -rf ~/.sparkmagic/config.json-e

The Sparkmagic setup is ready.

results matching ""

    No results matching ""