PySpark Count Distinct from DataFrame

In PySpark, you can use distinct().count() of DataFrame or countDistinct() SQL function to get the count distinct. distinct() eliminates duplicate records(matching all columns of a Row) from DataFrame, count() returns…

0 Comments