Sparkmagic Extension (Linux/MacOS)
Sparkmagic is a set of tools for interactively working with remote Spark clusters through Livy, a Spark REST server, in Jupyter notebooks. The Sparkmagic project includes a set of magics for interactively running Spark code in multiple languages, as well as some kernels that you can use to turn Jupyter into an integrated Spark environment.
Installation
Activate the SDK conda environment:
conda activate olp-sdk-for-python-1.12-env
Generate the default configuration for Jupyter:
jupyter notebook --generate-config
Upgrade pip
and install the setuptools
module:
pip install --upgrade pip
pip install --upgrade --ignore-installed setuptools
Install the sparkmagic
extension:
pip install sparkmagic ipyleaflet geomet
Configure the Sparkmagic extension on Jupyter:
jupyter nbextension enable --py widgetsnbextension
jupyter nbextension enable --py --sys-prefix ipyleaflet
jupyter-kernelspec install $(pip show sparkmagic | grep Location | cut -d" " -f2)/sparkmagic/kernels/sparkkernel --user
jupyter-kernelspec install $(pip show sparkmagic | grep Location | cut -d" " -f2)/sparkmagic/kernels/pysparkkernel --user
jupyter-kernelspec install $(pip show sparkmagic | grep Location | cut -d" " -f2)/sparkmagic/kernels/sparkrkernel --user
jupyter serverextension enable --py sparkmagic
Execute the following commands:
sed -i -e 's/return self._pyspark_command(sql_context_variable_name)/return self._pyspark_command(sql_context_variable_name, False)/g' $(pip show sparkmagic | grep Location | cut -d" " -f2)/sparkmagic/livyclientlib/sqlquery.py
rm -rf $(pip show sparkmagic | grep Location | cut -d" " -f2)/sparkmagic/livyclientlib/sqlquery.py-e
Note
The two commands above are a temporary solution for a known bug that affects sparkmagic
integration with pyspark
using python3
. The solution by the community is in progress and is expected to be delivered in the next release.
Configuration
Create the Sparkmagic configuration directory:
mkdir -p ~/.sparkmagic
Download the spark configuration files using this link.
Unzip the downloaded file, and open a terminal in the unziped folder:
unzip spark-conf-files.zip
cd spark-conf-files/
The sparkmagic
configuration file includes Data SDK jars for version 2.11.7. The latest version of Data SDK jars can be identified using this link in the Include BOMs sub-section. To obtain the latest Data SDK jars, execute the script config_file_updater.py
using the following commands:
python config_file_updater.py --version <version_to_upgrade_to>
Copy the sparkmagic
configuration file into ~/.sparkmagic
:
cp config.json ~/.sparkmagic/config.json
To copy your home directory path inside the sparkmagic
configuration file, run these commands:
sed -i -e "s|\${HOME}|$HOME|g" ~/.sparkmagic/config.json
rm -rf ~/.sparkmagic/config.json-e
The Sparkmagic setup is ready.