Convert PySpark RDD to DataFrame

In PySpark, toDF() function of the RDD is used to convert RDD to DataFrame. We would need to convert RDD to DataFrame as DataFrame provides more advantages over RDD. For instance, DataFrame is a distributed collection of data organized into named columns similar to Database tables and provides optimization and…

Continue Reading Convert PySpark RDD to DataFrame

PySpark – Create DataFrame with Examples

You can manually create a PySpark DataFrame using toDF() and createDataFrame() methods, both these function takes different signatures in order to create DataFrame from existing RDD, list, and DataFrame. You can also create PySpark DataFrame from data sources like TXT, CSV, JSON, ORV, Avro, Parquet, XML formats by reading from…

Continue Reading PySpark – Create DataFrame with Examples

PySpark withColumnRenamed to Rename Column on DataFrame

Use PySpark withColumnRenamed() to rename a DataFrame column, we often need to rename one column or multiple (or all) columns on PySpark DataFrame, you can do this in several ways. When columns are nested it becomes complicated. Since DataFrame's are an immutable collection, you can't rename or update a column…

Continue Reading PySpark withColumnRenamed to Rename Column on DataFrame

Convert Spark RDD to DataFrame | Dataset

While working in Apache Spark with Scala, we often need to Convert Spark RDD to DataFrame and Dataset as these provide more advantages over RDD. For instance, DataFrame is a distributed collection of data organized into named columns similar to Database tables and provides optimization and performance improvement. In this…

Continue Reading Convert Spark RDD to DataFrame | Dataset

Spark withColumnRenamed to Rename Column

In Spark withColumnRenamed() is used to rename one column or multiple DataFrame column names. Depends on the DataFrame schema, renaming columns might get simple to complex, especially when a column is nested with struct type it gets complicated. In this article, I will explain how to rename a DataFrame column…

Continue Reading Spark withColumnRenamed to Rename Column

Different ways to create Spark RDD

Spark RDD can be created in several ways using Scala & Pyspark languages, for example, It can be created by using sparkContext.parallelize(), from text file, from another RDD, DataFrame, and Dataset. Though we have covered most of the examples in Scala here, the same concept can be used to create…

Continue Reading Different ways to create Spark RDD

Spark Create DataFrame with Examples

In Spark, createDataFrame() and toDF() methods are used to create a DataFrame manually, using these methods you can create a Spark DataFrame from already existing RDD, DataFrame, Dataset, List, Seq data objects, here I will examplain these with Scala examples. You can also create a DataFrame from different sources like…

Continue Reading Spark Create DataFrame with Examples