Spark SQL Date and Timestamp Functions
Spark SQL provides built-in standard Date and Timestamp (includes date and time) Functions defines in DataFrame API, these come in handy when we need to make operations on date and…
Spark SQL provides built-in standard Date and Timestamp (includes date and time) Functions defines in DataFrame API, these come in handy when we need to make operations on date and…
Spark SQL provides several built-in standard functions org.apache.spark.sql.functions to work with DataFrame/Dataset and SQL queries. All these Spark SQL Functions return org.apache.spark.sql.Column type. In order to use these SQL Standard…
Let's see how to add a new column by assigning a literal or constant value to Spark DataFrame. Spark SQL provides lit() and typedLit() function to add a literal value to DataFrame. These both functions return Column type.
When you have a need to write complex XML structures from Spark Data Frame and Databricks XML API is not suitable for your use case, you could use XStream API to convert data to XML string and write it as a text. Let's see how to do this using an example.
Apache Spark can also be used to process or read simple to complex nested XML files into Spark DataFrame and writing it back to XML using Databricks Spark XML API…
This article describes Spark Structured Streaming from Kafka in Avro file format and usage of from_avro() and to_avro() SQL functions using the Scala programming language. Spark Streaming Kafka messages in…
Spark Streaming with Kafka Example Using Spark Streaming we can read from Kafka topic and write to Kafka topic in TEXT, CSV, AVRO and JSON formats, In this article, we…
This article describes usage and differences between complete, append and update output modes in Apache Spark Streaming. outputMode describes what data is written to a data sink (console, Kafka e.t.c) when there is new data available in streaming input (Kafka, Socket, e.t.c)
Using Spark streaming we will see a working example of how to read data from TCP Socket, process it and write output to console. Spark uses readStream() to read and…
This article describes and provides an example of how to continuously stream or read a JSON file source from a folder, process it and write the data to another source