Spark Structured Streaming Series – Deep dive sessions

We are planning to host a series of three sessions to cover Spark Structured Streaming in greater detail. The objective is to familiarize users with the design and development of streaming applications using Kakfa and Spark.

The session is limited in number (50)

Agenda (Hands-on session) – Basic – Part One

• Introduction to structured streaming

• Unified batch and streaming API

• Define source – file source, socket source

• Define sink – memory, console, file sink

• Typed vs Untyped

• Defining a schema

• Spark SQL and streaming

• Aggregations – max, avg, etc

• Reading from kafka topic

• Writing to a kafka topic

Speaker:

Vishnu Viswanath

Data Engineer, Media Math

Hands-on session instruction

https://github.com/soniclavier/bigdata-notebook/tree/master/spark_23/spark-kafka-docker
https://github.com/soniclavier/spark-kafka

5 Independence Way   Princeton, NJ 08540   info@airisdata.com   609.281.5030   Careers   Blog   Contact Us
Copyright © 2016. airis.DATA. All Rights Reserved.

Parquet, Avro, Kafka, Apache Hadoop, Apache Spark and the Apache Spark Logo are trademarks of the Apache Software Foundation.