Spark Drop, Delete, Truncate Differences

Let’s discuss the differences between drop, delete, and truncate using Spark SQL. Even though Drop, Delete, and Truncate sound the same but, there is a huge difference when it comes to implementation. Drop and Truncate are the DDL (Data Definition Language) commands, whereas Delete comes under DML(Data Manipulation Language). These are mainly used to eliminate unwanted records and tables.

1. Using DROP command in Spark?

The DROP in Spark SQL is a DDL command, that drops the table with all its table data (only for the internal Hive table), schema, associated indexes, constraints, triggers, and permission specified to the table.

Syntax:


// Syntax of DROP
DROP TABLE tableName

It only takes the table name as a parameter to be dropped from the database.

Note: In case of internal tables, the DROP command drops both tables from metadata and table data from location. Whereas, in the case of external tables DROP command only drops the table definition from metadata. The underlying data of external tables remains as it is.

2. Using DELETE command in Spark?

The DELETE in spark-SQL is a DML command that is used to remove unwanted records from a table in database. We can specify WHERE clause in DELETE command defines a condition for removing only the selected unwanted rows from the target table.

Syntax:


// Syntax of DELETE
DELETE FROM tableName WHERE condition;

Example: In Azure Databricks, you can try the below examples to delete the data from the table.

Spark Drop, Delete and Truncate — Spark SQL DELETE Operation

In the case of Spark delta tables, the DELETE operation is recorded in delta logs as a transaction. This helps to retrieve them using the ROLLBACK command.

Find out more information on the rollback feature in the Delta table here Time Travel with Delta Tables in Databricks?

3. Using TRUNCATE command in Spark?

The TRUNCATE in Spark SQL is a DDL command that cleans all the records in the target table at once. This does not require any condition like a WHERE clause in the DROP command.

The truncate operation uses table lock, to lock the table records instead of row lock. As a result, it only logs the removal of total records stored, which is much faster than the DELETE operation.

Syntax:


// Syntax of TRUNCATE
TRUNCATE TABLE tableName;

Example:

As the TRUNCATE command is a DDL command, the commit operation is completed automatically without human intervention. The identification is reverted back to its seed value when the TRUNCATE command is executed.

Seed value: Seed value in Spark-SQL refers to the internal value that the Server uses to generate the next value.

Note: DELETE operation without any condition also works the same as like TRUNCATE operation.

4. Compare DROP vs DELETE vs TRUNCATE

Basis	DROP	DELETE	TRUNCATE
Syntax	DROP TABLE tableName;	DELETE FROM tableName WHERE condition;	TRUNCATE TABLE tableName;
Language	Data Definition Language Command	Data Manipulation Language Command	Data Definition Language Command
Permission	ALTER permission on the schema and CONTROL permission on the table.	DELETE permission is required	ALTER permission on the table
Use-Case	To remove the table holding data.	To remove one or more records from a table.	To remove all the records of the table.
Rollback	We cannot roll back the table once dropped.	We can rollback the deleted records of a table using versioning,	We can roll back the table data using versioning.
Performance	DROP is faster than DELETE. But Truncate is faster than DROP as drop needs to delete both table and data.	DELETE operation is slower than the DROP and TRUNCATE as it deletes one or more rows based on a condition.	TRUNCATE operation is faster than the DROP and DELETE commands because it deletes all the records from the table without any condition.
Memory	DROP command removes the space allocated for the table from memory.	DELETE operation does not free the allocated space of the table from memory.	TRUNCATE command does not free the space allocated for the table from memory.

Spark SQL Drop vs Delete Vs Truncate

5. Conclusion

In this article, you have learned how to use DROP, DELETE, and TRUNCATE tables in Spark or PySpark.

DROP: Drops table details from metadata and data of internal tables.
DELETE: Deletes one or more records based on the condition provided.
TRUNCATE: Truncates all the records in the target table.

Table of contents

1. Using DROP command in Spark?

2. Using DELETE command in Spark?

3. Using TRUNCATE command in Spark?

4. Compare DROP vs DELETE vs TRUNCATE

5. Conclusion

Related Articles