What does SparkContext do?
The SparkContext is a fundamental component of Apache Spark. It plays very important role in managing and coordinating the execution of Spark applications. Below is an overview of what the…
The SparkContext is a fundamental component of Apache Spark. It plays very important role in managing and coordinating the execution of Spark applications. Below is an overview of what the…
Pyspark cache() method is used to cache the intermediate results of the transformation so that other transformation runs on top of cached will perform faster. Caching the result of the…
Steps to install Apache Spark 3.5 Installation on Windows - In this article, I will explain step-by-step how to do Apache Spark 3.5 Installation on Windows OS 7, 10, and…
In this article, I will explain what is Hive Partitioning and Bucketing, the difference between Hive Partitioning vs Bucketing by exploring the advantages and disadvantages of each features with examples.…
Hive ALTER TABLE command is used to update or drop a partition from a Hive Metastore and HDFS location (managed table). You can also manually update or drop a Hive…
To export a Hive table into a CSV file you can use either INSERT OVERWRITE DIRECTORY or by piping the output result of the select query into a CSV file.…
Let's learn what are Internal (Managed) and External tables and their differences, the main difference between Hive external table vs internal tables are owned and managed by Hive whereas external…
Using CREATE TEMPORARY TABLE statement we can create a temporary table in Hive which is used to store the data temporarily within an active session and the temporary tables get…
Hive stores data at the HDFS location /user/hive/warehouse folder if not specified a folder using the LOCATION clause while creating a table. Hive is a data warehouse database for Hadoop, all…
Using CREATE DATABASE statement you can create a new Database in Hive, like any other RDBMS Databases, the Hive database is a namespace to store the tables. In this article,…