PySpark Distinct to Drop Duplicate Rows
PySpark distinct() function is used to drop/remove the duplicate rows (all columns) from DataFrame and dropDuplicates() is used to drop rows based on selected (one or multiple) columns. In this…
8 Comments
August 12, 2020