What is Spark Streaming?

Apache Spark Streaming is a scalable, high-throughput, fault-tolerant streaming processing system that supports both batch and streaming workloads. It is an extension of the core Spark API to process real-time data from sources like Kafka, Flume, and Amazon Kinesis to name a few. This processed data can be pushed to other systems like databases, Kafka, live dashboards e.t.c

Apache Kafka is a publish-subscribe messaging system originally written at LinkedIn. A Kafka cluster is a highly scalable and fault-tolerant system and it also has a much higher throughput compared to other message brokers such as ActiveMQ and RabbitMQ

Spark Streaming with Kafka Example

| *** Please Subscribe for Ad Free & Premium Content ***

Post author:Naveen Nelamali
Post category:Apache Spark / Apache Spark Streaming / Member
Post last modified:April 24, 2024
Reading time:13 mins read

You are currently viewing Spark Streaming with Kafka Example

Access to this content is reserved for our valued members. Please log in to your account to unlock this exclusive material. As a member, you’ll enjoy a range of benefits including premium articles, in-depth guides. If you’re not yet a member, sign up now to gain access to this and much more. Thank you for being part of our community and for your continued support.

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium