PySpark Join Two or Multiple DataFrames

PySpark DataFrame has a join() operation which is used to combine columns from two or multiple DataFrames (by chaining join()), in this article, you will learn how to do a PySpark Join on Two or Multiple DataFrames by applying conditions on the same or different columns. also, you will learn…

Continue Reading PySpark Join Two or Multiple DataFrames

PySpark Where Filter Function | Multiple Conditions

PySpark filter() function is used to filter the rows from RDD/DataFrame based on the given condition or SQL expression, you can also use where() clause instead of the filter() if you are coming from an SQL background, both these functions operate exactly the same. In this PySpark article, you will…

Continue Reading PySpark Where Filter Function | Multiple Conditions

Spark DataFrame Where() to filter rows

Spark where() function is used to filter the rows from DataFrame or Dataset based on the given condition or SQL expression, In this tutorial, you will learn how to apply single and multiple conditions on DataFrame columns using where() function with Scala examples. Spark DataFrame where() Syntaxes 1) where(condition: Column):…

Continue Reading Spark DataFrame Where() to filter rows

Spark DataFrame Where Filter | Multiple Conditions

Spark filter() or where() function is used to filter the rows from DataFrame or Dataset based on the given one or multiple conditions or SQL expression. You can use where() operator instead of the filter if you are coming from SQL background. Both these functions operate exactly the same. If…

Continue Reading Spark DataFrame Where Filter | Multiple Conditions

Spark Join Multiple DataFrames | Tables

Spark supports joining multiple (two or more) DataFrames, In this article, you will learn how to use a Join on multiple DataFrames using Spark SQL expression(on tables) and Join operator with Scala example. Also, you will learn different ways to provide Join condition. In order to explain join with multiple…

Continue Reading Spark Join Multiple DataFrames | Tables

Spark SQL Inner Join Explained

Similar to SQL, Spark also supports Inner join to join two DataFrame tables, In this article, you will learn how to use an Inner Join on DataFrame with Scala example. Also, you will learn different ways to provide Join condition. Inner join is the default join in Spark and it’s mostly…

Continue Reading Spark SQL Inner Join Explained

Spark SQL Self Join Explained

Similar to SQL, Spark also provides to Self join to join a DataFrame or table to itself, In this article, you will learn how to use a Self Join on multiple DataFrame tables with Scala example. Also, you will learn different ways to provide Join condition. Before we jump into…

Continue Reading Spark SQL Self Join Explained

Spark SQL Join on multiple columns

In this article, you will learn how to use Spark SQL Join condition on multiple columns of DataFrame and Dataset with Scala example. Also, you will learn different ways to provide Join condition on two or more columns. Before we jump into how to use multiple columns on Join expression,…

Continue Reading Spark SQL Join on multiple columns