PySpark Count of Non null, nan Values in DataFrame

Problem: Could you please explain how to get a count of non null and non nan values of all columns, selected columns from DataFrame with Python examples? Solution: In order to find non-null values of PySpark DataFrame columns, we need to use negate of isNotNull() function for example ~df.name.isNotNull() similarly…

Continue Reading PySpark Count of Non null, nan Values in DataFrame

PySpark Replace Empty Value With None/null on DataFrame

In PySpark DataFrame use when().otherwise() SQL functions to find out if a column has an empty value and use withColumn() transformation to replace a value of an existing column. In this article, I will explain how to replace an empty value with None/null on a single column, all columns selected…

Continue Reading PySpark Replace Empty Value With None/null on DataFrame

Spark Replace Empty Value With NULL on DataFrame

In order to replace empty string value with NULL on Spark DataFrame use when().otherwise() SQL functions. In this article, I will explain how to replace an empty value with null on a single column, all columns, selected list of columns of DataFrame with Scala examples. Related: How to get Count…

Continue Reading Spark Replace Empty Value With NULL on DataFrame

Spark Find Count of NULL, Empty String Values

Problem: Could you please explain how to find/calculate the count of NULL or Empty string values of all columns or a list of selected columns in Spark DataFrame using the Scala example? Solution: In Spark DataFrame you can find the count of Null or Empty/Blank string values in a column…

Continue Reading Spark Find Count of NULL, Empty String Values

PySpark – Find Count of null, None, NaN Values

In PySpark DataFrame you can calculate the count of Null, None, NaN & Empty/Blank values in a column by using isNull() of Column class & SQL functions isnan() count() and when(). In this article, I will explain how to get the count of Null, None, NaN, empty or blank values…

Continue Reading PySpark – Find Count of null, None, NaN Values

Spark Exception: Python in worker has different version 3.4 than that in driver 2.7, PySpark cannot run with different minor versions

Somehow I got Python 3.4 & 2.7 installed on my Linux cluster and while running the PySpark application, I was getting Exception: Python in worker has different version 3.4 than that in driver 2.7, PySpark cannot run with different minor versions. I spent some time looking at it on google…

Continue Reading Spark Exception: Python in worker has different version 3.4 than that in driver 2.7, PySpark cannot run with different minor versions

PySpark JSON Functions with Examples

PySpark JSON functions are used to query or extract the elements from JSON string of DataFrame column by path, convert it to struct, mapt type e.t.c, In this article, I will explain the most used JSON SQL functions with Python examples. 1. PySpark JSON Functions from_json() - Converts JSON string…

Continue Reading PySpark JSON Functions with Examples

Spark Most Used JSON Functions with Examples

Spark SQL provides a set of JSON functions to parse JSON string, query to extract specific values from JSON. In this article, I will explain the most used JSON functions with Scala examples. 1. Spark JSON Functions from_json() - Converts JSON string into Struct type or Map type. to_json() -…

Continue Reading Spark Most Used JSON Functions with Examples

Spark from_json() – Convert JSON Column to Struct, Map or Multiple Columns

In Spark/PySpark from_json() SQL function is used to convert JSON string from DataFrame column into struct column, Map type, and multiple columns. 1. Spark from_json() Syntax Following are the different syntaxes of from_json() function. from_json(Column jsonStringcolumn, Column schema) from_json(Column jsonStringcolumn, DataType schema) from_json(Column jsonStringcolumn, StructType schema) from_json(Column jsonStringcolumn, DataType schema,…

Continue Reading Spark from_json() – Convert JSON Column to Struct, Map or Multiple Columns

PySpark Parse JSON from String Column | TEXT File

In this PySpark article I will explain how to parse or read a JSON string from a TEXT/CSV file and convert it into DataFrame columns using Python examples, In order to do this, I will be using the PySpark SQL function from_json(). 1. Read JSON String from a TEXT file…

Continue Reading PySpark Parse JSON from String Column | TEXT File