Fast Data architectures provide an answer to the increasing need for the enterprise to process and analyze continuous streams of data, which helps accelerate decision making and enables faster responses to changing characteristics of their market. Apache Spark is a popular framework for data analytics. Its capabilities in the streaming domain are represented by two APIs: The low-level Spark Streaming and the more declarative Structured Streaming, which builds upon the recent advances in Spark SQL query optimization and code generation.
After a quick introduction to both APIs, we will discuss their virtues, capabilities and key differences:
- How to get started: ease of development.
- How to deal with time: both at the processing and event level
- How to deal with state: locally, distributed and its relation with time
- How to migrate: functional coding strategies
- How to integrate: Fast Data and microservices
Using a practical approach supported by live demonstrations, we will provide insights into the sweet spot of each API, guidance on how to choose one or even combine both APIs to implement functional and resilient streaming pipelines.
SHARE THIS TALK
Lightbend, Senior SW Engineer
I'm a hands-on technical leader. While keeping strong bonds with the software stack and technical architecture, I guide and coach individuals to ensure that the team adapts and grows to excel on the expected goals.
My current professional interest is in the area of distributed and scalable stream processing, in particular using an integrated open-source based stack: "Get the data flowing, the value collectors going and the storage to scale."
Co-author of the book: Stream Processing with Apache Spark, by O'Reilly.