There are several ways to install PySpark on Mac and run using Jupyter Notebook. Below I have explained the step-by-step of PySpark and Jupyter installation on Mac OS using Homebrew.
PySpark & Jupyter Installation Steps on Mac OS
- Step 1 – Install Homebrew
- Step 2 – Install Java
- Step 3 – Install Scala (Optional)
- Step 4 – Install Python
- Step 5 – Install PySpark
- Step 6 – Install Jupyter
- Step 7 – Run an Example in Jupyter
Related: PySpark installation on Windows
Step 1. Install PySpark on Mac using Homebrew
Homebrew is a Missing Package Manager for macOS (or Linux) that is used to install third-party packages like Java, PySpark on Mac OS. In order to use this, first, you need to install it by using the below command.
# Install Homebrew
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
This prompts for the root password. You will need to type your root password to run this command. The root password is the same as your login password on a personal laptop. If you don’t have root access, contact your system admin. You should see something like this below after the successful installation of Homebrew.
Post-installation, you may need to run the below command to set the brew to your $PATH
.
# Set brew to Path
echo 'eval "$(/opt/homebrew/bin/brew shellenv)"' >> /Users/admin/.zprofile
eval "$(/opt/homebrew/bin/brew shellenv)"
If the above command has issues, you can find the latest command from Homebrew.
Step 2. Install Java Version
Install OpenJDK Java as PySpark need Java to run.
# Install OpenJDK 11
brew install openjdk@11
Step 3. Install Scala
Since Spark is written in Scala language it is obvious you would need Scala to run Spark programs however to run PySpark this is optional.
# Install Scala (optional)
brew install scala
Step 4. Install Python
As you would know, PySpark is used to run Spark jobs in Python hence you also need Python to install on Mac OS. let’s install it by using Homebrew. If you already have Python 2.7 or the latest then ignore this step.
# install Python
brew install python
Step 5. Install PySpark on Mac
Install PySpark from PyPI.
# Install Apache Spark
pip install pyspark
This installs the latest version of Apache Spark which ideally includes PySpark.
After successful installation of Apache Spark run pyspark
from the command line to launch PySpark shell.
Note that it displays Spark and Python versions to the terminal.
Step 6. Install Jupyter
In real-time when you are working on data analysis or machine learning, you would be required to run the PySpark application in the Jupyter notebook hence let’s learn how to use Jupyter after installing.
brew install jupyter
This installs Jupyterlab on your Mac OS.
Now, let’s start the Jupyter Notebook and run the PySpark example. This opens up Jupyter in a default web browser.
jupyter notebook
Step 7. Run PySpark Example in Jupyter Notebook
Open Jupyter and run the following statements to run.
Conclusion
In this PySpark installation article, you have learned the step-by-step installation of PySpark and Jupyter on Mac OS. Steps include installing Java, Scala, Python, PySpark, and Jupyter by using Homebrew.
Happy Learning !!
Related Articles
- How to Check Spark Version
- Install PySpark on Ubuntu running on Linux
- Install PySpark in Anaconda & Jupyter Notebook
- How to Install PySpark on Mac
- How to Install PySpark on Windows
- Install Pyspark using pip or condo
- Dynamic way of doing ETL through Pyspark
- Ways to Install Jupyter Notebook on Mac OS
- Update Jupyter Notebook or Jupyterlab