• Post author:
  • Post category:PySpark
  • Post last modified:March 27, 2024
  • Reading time:9 mins read
You are currently viewing How to Install PySpark on Mac (in 2023)

There are multiple ways to install PySpark on Mac, below I have explained the step-by-step of PySpark installation on Mac OS using Homebrew, run PySpark shell, and create a PySpark DataFrame. Besides this, you can also install PySpark using Anaconda and run a program from Jupyter Notebook.

Steps to install PySpark on Mac OS using Homebrew

  • Step 1 – Install Homebrew
  • Step 2 – Install Java
  • Step 3 – Install Scala (Optional)
  • Step 4 – Install Python
  • Step 5 – Install PySpark
  • Step 6 – Start PySpark shell and Validate Installation

Related: PySpark installation on Windows

1. Install PySpark on Mac using Homebrew

Homebrew is a Missing Package Manager for macOS (or Linux) that is used to install third-party packages like Java, PySpark on Mac OS. In order to use this, first, you need to install it by using the below command.


# Install Homebrew
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

This prompts for the root password. You will need to type your root password to run this command. On a personal laptop, this is the same password you enter when you log into your Mac. If you don’t have root access, contact your system admin. You should see something like this below after the successful installation of homebrew.

homebrew install

Post-installation, you may need to run the below command to set the brew to your $PATH.


# Set brew to Path
echo 'eval "$(/opt/homebrew/bin/brew shellenv)"' >> /Users/admin/.zprofile
eval "$(/opt/homebrew/bin/brew shellenv)"

If the above command has issues, you can find the latest command from Homebrew.

2. Install Java Version

PySpark uses Java underlying hence you need to have Java on your Mac. Since Java is a third party, you can install it using the Homebrew command brew. Since Oracle Java is not open source anymore, I am using the OpenJDK version 11. Run the below command in the terminal to install it.


# Install OpenJDK 11
brew install openjdk@11

3. Install Scala

Since Spark is written in Scala language it is obvious you would need Scala to run Spark programs however to run PySpark this is optional.


# Install Scala (optional)
brew install scala

4. Install Python

As you would know, PySpark is used to run Spark jobs in Python hence you also need Python to install on Mac OS. let’s install it by using Homebrew. If you already have Python 2.7 or the latest then ignore this step.


# install Python
brew install python

5. Install PySpark on Mac

PySpark is a Spark library written in Python to run Python applications using Apache Spark capabilities. Spark was basically written in Scala and later on due to its industry adaptation its API PySpark was released for Python using Py4J. Py4J is a Java library that is integrated within PySpark and allows python to dynamically interface with JVM objects, hence to run PySpark you also need Java to be installed along with Python, and Apache Spark. So to use PySpark, let’s install PySpark on Mac.


# Install Apache Spark
brew install apache-spark

This installs the latest version of Apache Spark which ideally includes PySpark.

pyspark install mac

After successful installation of Apache Spark run pyspark from the command line to launch PySpark shell.

pyspark shell command
PySpark Shell

Note that it displays Spark and Python versions to the terminal.

6. Validate PySpark Installation from Shell

Let’s create a PySpark DataFrame with some sample data to validate the installation. Enter the following commands in the PySpark shell in the same order.


# Create DataFrame in PySpark Shell
data = [("Java", "20000"), ("Python", "100000"), ("Scala", "3000")]
df = spark.createDataFrame(data)
df.show()

Yields below output. For more examples on PySpark refer to PySpark Tutorial with Examples.

pyspark install example

Now access http://localhost:4041/jobs/ from your favorite web browser to access Spark Web UI to monitor your jobs.

Conclusion

In this PySpark installation article, you have learned the step-by-step installation of PySpark. Steps include installing Java, Scala, Python, and PySpark by using Homebrew.

Happy Learning !!

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium

This Post Has 4 Comments

  1. stu

    Clear instructions, worked like a charm

  2. krishnasai

    thank you

  3. NNK

    You need to install pyspark and have your environment variables set right to use pyspark in Python. May I know what error you are getting?

  4. mhr

    cannot import pyspark from python

Comments are closed.