Spark SQL – Select Columns From DataFrame

In Spark SQL, select() function is used to select one or multiple columns, nested columns, column by index, all columns, from the list, by regular expression from a DataFrame. select() is a transformation function in Spark and returns a new DataFrame with the selected columns. You can also alias column…

Continue Reading Spark SQL – Select Columns From DataFrame

Spark Cast String Type to Integer Type (int)

In Spark SQL, in order to convert/cast String Type to Integer Type (int), you can use cast() function of Column class, use this function with withColumn(), select(), selectExpr() and SQL expression. This function takes the argument string representing the type you wanted to convert or any type that is a…

Continue Reading Spark Cast String Type to Integer Type (int)

PySpark Convert String Type to Double Type

In PySpark SQL, using the cast() function you can convert the DataFrame column from String Type to Double Type or Float Type. This function takes the argument string representing the type you wanted to convert or any type that is a subclass of DataType. Key points cast() - cast() is…

Continue Reading PySpark Convert String Type to Double Type

Spark select() vs selectExpr() with Examples

Spark SQL select() and selectExpr() are used to select the columns from DataFrame and Dataset, In this article, I will explain select() vs selectExpr() differences with examples. Both these are transformation operations and return a new DataFrame or Dataset based on the usage of UnTyped and Type columns. Spark select()…

Continue Reading Spark select() vs selectExpr() with Examples

PySpark Select Columns From DataFrame

In PySpark, select() function is used to select single, multiple, column by index, all columns from the list and the nested columns from a DataFrame, PySpark select() is a transformation function hence it returns a new DataFrame with the selected columns. Select a Single & Multiple Columns from PySparkSelect All…

Continue Reading PySpark Select Columns From DataFrame