PySpark isin() or IN operator is used to check/filter if the DataFrame values are exists/contains in the list of values. isin() is a function of Column class which returns a boolean value True if the value of the expression is contained by the evaluated values of the arguments.
Sometimes, when you use isin(list_param) from the Column class, you will get an error java.lang.RuntimeException: Unsupported literal type class scala.collection.immutable.$colon$colon
. In this article I will explain how to overcome this error while using isin() function.

1. Quick Examples of Using isin()
Following are quick examples of how to use isin() to filter the rows from DataFrame.
# PySpark isin()
listValues = ["Java","Scala"]
df.filter(df.languages.isin(listValues)).show()
from pyspark.sql.functions import col
df.filter(col("languages").isin(listValues)).show()
# Using IN operator
df.filter("languages in ('Java','Scala')" ).show()
# PySpark SQL IN - check value in a list of values
df.createOrReplaceTempView("TAB")
spark.sql("SELECT * FROM TAB WHERE " +
"languages IN ('Java','Scala')").show()
2. Syntax of isin()
Following is the syntax of isin() function. This function takes *cols as argument.
# Syntax of isin()
Column.isin(*cols)
Let’s create a DataFrame
#Import
from pyspark.sql import SparkSession
# Create SparkSession
spark = SparkSession.builder.master("local[1]") \
.appName("SparkByExamples.com").getOrCreate()
# Create DataFrame
data = [("James","Java"),("Michael","Spark"),
("Robert","Python")]
columns = ["name","languages"]
df = spark.createDataFrame(data,columns)
df.show()
3. PySpark isin() Example
pyspark.sql.Column.isin() function is used to check if a column value of DataFrame exists/contains in a list of string values and this function mostly used with either where() or filter() functions. Let’s see with an example, below example filter the rows languages
column value present in ‘Java
‘ & ‘Scala
‘. Note that the isin() or IN operator is a shorthand for multiple OR conditions.
# PySpark isin()
listValues = ["Java","Scala"]
df.filter(df.languages.isin(listValues)).show()
from pyspark.sql.functions import col
df.filter(col("languages").isin(listValues)).show()
Yields below output.

4. Using PySpark IN Operator
Let’s see how to use IN
operator in PySpark to filter rows.
# Using IN operator
df.filter("languages in ('Java','Scala')" ).show()
5. PySpark SQL IN Operator
In PySpark SQL, isin() function doesn’t work instead you should use IN
operator to check values present in a list of values, it is usually used with the WHERE clause. In order to use SQL, make sure you create a temporary view using createOrReplaceTempView().
# PySpark SQL IN - check value in a list of values
df.createOrReplaceTempView("TAB")
spark.sql("SELECT * FROM TAB WHERE " +
"languages IN ('Java','Scala')").show()
These results same output as above.
6. Conclusion
PySpark isin() function is used to check if the DataFrame column value exists in a list/array of values. isin() function is from Column class that return a boolean value.
Happy Learning !!
Related Articles
- PySpark SQL Left Semi Join Example
- PySpark NOT isin() or IS NOT IN Operator
- PySpark Get Number of Rows and Columns
- PySpark NOT isin() or IS NOT IN Operator
- PySpark SQL Self Join With Example
- PySpark SQL expr() (Expression ) Function
- PySpark SQL – Working with Unix Time | Timestamp
- PySpark SQL Date and Timestamp Functions