Let’s discuss the differences between drop, delete, and truncate using Spark SQL. Even though Drop, Delete, and Truncate sound the same but, there is a huge difference when it comes to implementation. Drop and Truncate are the DDL (Data Definition Language) commands, whereas Delete comes under DML(Data Manipulation Language). These are mainly used to eliminate unwanted records and tables.
In this article, I will explain the drop, delete, and truncate statements in Spark SQL and the difference between each by comparing each other.
Table of contents
1. Using DROP command in Spark?
The DROP in Spark SQL is a DDL command, that drops the table with all its table data (only for the internal Hive table), schema, associated indexes, constraints, triggers, and permission specified to the table.
// Syntax of DROP DROP TABLE tableName
It only takes the table name as a parameter to be dropped from the database.
Note: In case of internal tables, the DROP command drops both tables from metadata and table data from location. Whereas, in the case of external tables DROP command only drops the table definition from metadata. The underlying data of external tables remains as it is.
2. Using DELETE command in Spark?
The DELETE in spark-SQL is a DML command that is used to remove unwanted records from a table in database. We can specify WHERE clause in DELETE command defines a condition for removing only the selected unwanted rows from the target table.
// Syntax of DELETE DELETE FROM tableName WHERE condition;
Example: In Azure Databricks, you can try the below examples to delete the data from the table.
In the case of Spark delta tables, the DELETE operation is recorded in delta logs as a transaction. This helps to retrieve them using the ROLLBACK command.
Find out more information on the rollback feature in the Delta table here Time Travel with Delta Tables in Databricks?
3. Using TRUNCATE command in Spark?
The TRUNCATE in Spark SQL is a DDL command that cleans all the records in the target table at once. This does not require any condition like a WHERE clause in the DROP command.
The truncate operation uses table lock, to lock the table records instead of row lock. As a result, it only logs the removal of total records stored, which is much faster than the DELETE operation.
// Syntax of TRUNCATE TRUNCATE TABLE tableName;
As the TRUNCATE command is a DDL command, the commit operation is completed automatically without human intervention. The identification is reverted back to its seed value when the TRUNCATE command is executed.
Seed value: Seed value in Spark-SQL refers to the internal value that the Server uses to generate the next value.
Note: DELETE operation without any condition also works the same as like TRUNCATE operation.
4. Compare DROP vs DELETE vs TRUNCATE
|Syntax||DROP TABLE tableName;||DELETE FROM tableName|
|TRUNCATE TABLE tableName;|
|Language||Data Definition Language Command||Data Manipulation Language Command||Data Definition Language Command|
|Permission||ALTER permission on the schema and CONTROL permission on the table.||DELETE permission is required||ALTER permission on|
|Use-Case||To remove the table holding data.||To remove one or more records from a table.||To remove all the records of the table.|
|Rollback||We cannot roll back the table once dropped.||We can rollback the deleted records of a table using versioning,||We can roll back the table data using versioning.|
|Performance||DROP is faster than DELETE. But Truncate is faster than DROP as drop needs to delete both table and data.||DELETE operation is slower than the DROP and TRUNCATE as it deletes one or more rows based on a condition.||TRUNCATE operation is faster than the DROP and DELETE commands because it deletes all the records from the table without any condition.|
|Memory||DROP command removes the space allocated for the table from memory.||DELETE operation does not free the allocated space of the table from memory.|| TRUNCATE command does not free the space allocated for the table from memory.|
In this article, you have learned how to use DROP, DELETE, and TRUNCATE tables in Spark or PySpark.
- DROP: Drops table details from metadata and data of internal tables.
- DELETE: Deletes one or more records based on the condition provided.
- TRUNCATE: Truncates all the records in the target table.
- Spark Drop DataFrame from Cache
- Spark SQL – Truncate Date Time by unit specified
- Spark – How to Drop a DataFrame/Dataset column
- Spark Read and Write MySQL Database Table
- Spark with SQL Server – Read and Write Table
- Spark SQL Create a Table
- Spark Types of Tables and Views