You are currently viewing Spark Drop, Delete, Truncate Differences

Let’s discuss the differences between drop, delete, and truncate using Spark SQL. Even though Drop, Delete, and Truncate sound the same but, there is a huge difference when it comes to implementation. Drop and Truncate are the DDL (Data Definition Language) commands, whereas Delete comes under DML(Data Manipulation Language). These are mainly used to eliminate unwanted records and tables.

Advertisements

In this article, I will explain the drop, delete, and truncate statements in Spark SQL and the difference between each by comparing each other.

1. Using DROP command in Spark?

The DROP in Spark SQL is a DDL command, that drops the table with all its table data (only for the internal Hive table), schema, associated indexes, constraints, triggers, and permission specified to the table.

Syntax:


// Syntax of DROP
DROP TABLE tableName

It only takes the table name as a parameter to be dropped from the database.

Note: In case of internal tables, the DROP command drops both tables from metadata and table data from location. Whereas, in the case of external tables DROP command only drops the table definition from metadata. The underlying data of external tables remains as it is.

2. Using DELETE command in Spark?

The DELETE in spark-SQL is a DML command that is used to remove unwanted records from a table in database. We can specify WHERE clause in DELETE command defines a condition for removing only the selected unwanted rows from the target table. 

Syntax:


// Syntax of DELETE
DELETE FROM tableName WHERE condition;

Example: In Azure Databricks, you can try the below examples to delete the data from the table.

Spark Drop, Delete and Truncate
Spark SQL DELETE Operation

In the case of Spark delta tables, the DELETE operation is recorded in delta logs as a transaction. This helps to retrieve them using the ROLLBACK command.

Find out more information on the rollback feature in the Delta table here Time Travel with Delta Tables in Databricks?

3. Using TRUNCATE command in Spark?

The TRUNCATE in Spark SQL is a DDL command that cleans all the records in the target table at once. This does not require any condition like a WHERE clause in the DROP command.

The truncate operation uses table lock, to lock the table records instead of row lock. As a result, it only logs the removal of total records stored, which is much faster than the DELETE operation.

Syntax:


// Syntax of TRUNCATE
TRUNCATE TABLE tableName;

Example:

Spark Drop, Delete and Truncate
Spark-SQL Truncate Operation

As the TRUNCATE command is a DDL command, the commit operation is completed automatically without human intervention. The identification is reverted back to its seed value when the TRUNCATE command is executed.

Seed value: Seed value in Spark-SQL refers to the internal value that the Server uses to generate the next value.

Note: DELETE operation without any condition also works the same as like TRUNCATE operation.

4. Compare DROP vs DELETE vs TRUNCATE

BasisDROPDELETETRUNCATE
Syntax DROP TABLE tableName;DELETE FROM tableName
WHERE condition;
TRUNCATE TABLE tableName;
Language Data Definition Language CommandData Manipulation Language CommandData Definition Language Command
Permission ALTER permission on the schema and CONTROL permission on the table.DELETE permission is requiredALTER permission on
the table
Use-CaseTo remove the table holding data.To remove one or more records from a table.To remove all the records of the table.
RollbackWe cannot roll back the table once dropped.We can rollback the deleted records of a table using versioning,We can roll back the table data using versioning.
Performance DROP is faster than DELETE. But Truncate is faster than DROP as drop needs to delete both table and data.DELETE operation is slower than the DROP and TRUNCATE as it deletes one or more rows based on a condition.
TRUNCATE operation is faster than the DROP and DELETE commands because it deletes all the records from the table without any condition.

MemoryDROP command removes the space allocated for the table from memory.
DELETE operation does not free the allocated space of the table from memory. TRUNCATE command does not free the space allocated for the table from memory.

Spark SQL Drop vs Delete Vs Truncate

5. Conclusion

In this article, you have learned how to use DROP, DELETE, and TRUNCATE tables in Spark or PySpark.

  • DROP: Drops table details from metadata and data of internal tables.
  • DELETE: Deletes one or more records based on the condition provided.
  • TRUNCATE: Truncates all the records in the target table.