PySpark – What is SparkSession?

Since Spark 2.0 SparkSession has become an entry point to PySpark to work with RDD, DataFrame. Prior to 2.0, SparkContext used to be an entry point. Here, I will mainly focus on explaining what is SparkSession by defining and describing how to create SparkSession and using default SparkSession spark variable…

Continue Reading PySpark – What is SparkSession?

Spark – What is SparkSession Explained

Since Spark 2.0, SparkSession has become an entry point to Spark to work with RDD, DataFrame, and Dataset. Prior to 2.0, SparkContext used to be an entry point. Here, I will mainly focus on explaining what is SparkSession by defining and describing how to create Spark Session and using default…

Continue Reading Spark – What is SparkSession Explained

SparkSession vs SparkContext

SparkSession vs SparkContext - Since earlier versions of Spark or Pyspark, SparkContext (JavaSparkContext for Java) is an entry point to Spark programming with RDD and to connect to Spark Cluster, Since Spark 2.0 SparkSession has been introduced and became an entry point to start programming with DataFrame and Dataset. Here,…

Continue Reading SparkSession vs SparkContext

SparkSession vs SQLContext

In Spark, SparkSession is an entry point to the Spark application and SQLContext is used to process structured data that contains rows and columns Here, I will mainly focus on explaining the difference between SparkSession and SQLContext by defining and describing how to create these two.instances and using it from…

Continue Reading SparkSession vs SQLContext

PySpark – Create DataFrame with Examples

You can manually create a PySpark DataFrame using toDF() and createDataFrame() methods, both these function takes different signatures in order to create DataFrame from existing RDD, list, and DataFrame. You can also create PySpark DataFrame from data sources like TXT, CSV, JSON, ORV, Avro, Parquet, XML formats by reading from…

Continue Reading PySpark – Create DataFrame with Examples

Spark Create DataFrame with Examples

In Spark, createDataFrame() and toDF() methods are used to create a DataFrame manually, using these methods you can create a Spark DataFrame from already existing RDD, DataFrame, Dataset, List, Seq data objects, here I will examplain these with Scala examples. You can also create a DataFrame from different sources like…

Continue Reading Spark Create DataFrame with Examples

Spark – Create a SparkSession and SparkContext

In Spark or PySpark SparkSession object is created programmatically using SparkSession.builder() and if you are using Spark shell SparkSession object "spark" is created by default for you as an implicit object whereas SparkContext is retrieved from the Spark session object by using sparkSession.sparkContext. In this article, you will learn how…

Continue Reading Spark – Create a SparkSession and SparkContext