A short tutorial notebook on PySpark
-
Install Java 7 or newer on your OS.
-
Download a pre-built version of Spark from here and unpack it.
-
Set the following environment variables, for example in your
~/.bashrc:export SPARK_HOME=/PATH/TO/SPARKexport PYTHONPATH=/PATH/TO/SPARK/python -
If you want Spark to work with a specific Python version/virtualenv, also set this one:
export PYSPARK_PYTHON=/PATH/TO/PYTHON/INSIDE/VIRTUALENV -
Install Py4j dependency:
pip install py4j