Who is this presentation for?
Data engineers and architects interested in understanding how and when to use Akka Streams and Kafka Streams in streaming applications.
Prior programming experience, preferably with Java or Scala, is required to work with the examples. Prior experience with Kafka, Kafka Streams, and Akka Streams will be useful, but not required.
Materials or downloads needed in advance
BEFORE THE TUTORIAL, please setup your laptop by cloning the following GitHub repo: https://github.com/lightbend/kafka-with-akka-streams-kafka-streams-tutorial. You can also download the latest release. Then follow the README's setup instructions.
What you'll learn
The attendee will understand: 1. How to combine Kafka with Akka Streams and Kafka Streams to implement stream processing microservices. 2. How to leverage the strengths of these tools while avoiding their weaknesses. 3. How these libraries compare to Spark Streaming and Flink for stream processing.
If you’re building streaming data apps, your first inclination might be to reach for Spark Streaming, Flink, Apex, or similar tools, which run as services to which you submit jobs for execution. But sometimes, writing conventional microservices, with embedded stream processing, is a better fit for your needs.
In this hands-on tutorial, we start with the premise that Kafka is the ideal backplane for reliable capture and organization of data streams for downstream consumption. Then, we build several applications using Akka Streams and Kafka Streams on top of Kafka. The goal is to understand the relative strengths and weaknesses of these toolkits for building Kafka-based streaming applications.
We’ll also compare and contrast them to systems like Spark Streaming and Flink, to understand when those tools are better choices. Briefly, Akka Streams and Kafka Streams are best for data-centric microservices, where maximum flexibility is required for running the applications and interoperating with other systems, while systems like Spark Streaming and Flink are best for richer analytics over large streams where horizontal scalability through “automatic” partitioning of the data is required.
Each engine has particular strengths that we’ll demonstrate:
- Kafka Streams is purpose built for reading data from Kafka topics, processing it, and writing the results to new topics. With powerful stream and table abstractions, and an “exactly-once” capability, it supports a variety of common scenarios involving transformation, filtering, and aggregation.
- Akka Streams emerged as a dataflow-centric abstraction for the general-purpose Akka Actors model, designed for general-purpose microservices, especially when per-event low-latency is important, such as for complex event processing, where each event requires individual handling. In contrast, many other systems are efficient at scale, when the overhead is amortized over sets of records or when processing “in bulk”. Also because of its general-purpose nature, Akka Streams supports a wider class of application problems and third-party integrations, but it’s less focused on Kafka-specific capabilities.
Kafka Streams and Akka Streams are both libraries that you integrate into your microservices, which means you must manage their lifecycles yourself, but you also get lots of flexibility to do this as you see fit.
In contrast, Spark Streaming and Flink run their own services. You write “jobs” or use interactive shells that tell these services what computations to do over data sources and where to send results. Spark and Flink then determine what processes to run in your cluster to implement the dataflows. Hence, there is less of a DevOps burden to bear, but also less flexibility when you might need it. Both systems are also more focused on data analytics problems, with various levels of support for SQL over streams, machine learning model training and scoring, etc.
For the tutorial, you’ll be given an execution environment and the code examples in a GitHub repo. We’ll experiment with the examples together, interspersed with short presentations, to understand their strengths, weaknesses, performance characteristics, and lifecycle management requirements.
SHARE THIS WORKSHOP
Loic R Julien
IBM, Software Architect
Loic Julien is a Senior Software Engineer for IBM's Seattle Lab. At IBM, he has worked in numerous areas of data management, including drivers, cloud-based deployment and toolings. Loic is currently an architect for Db2 Event Store - A new store which is capable of high speed ingest and real-time advanced analytics on open data.
Lightbend, Principal Architect
Boris Lublinsky is a principal architect at Lightbend, where he specializes in big data, stream processing, and services. Boris has over 30 years’ experience in enterprise architecture. Over his career, he has been responsible for setting architectural direction, conducting architecture assessments, and creating and executing architectural roadmaps in fields such as big data (Hadoop-based) solutions, service-oriented architecture (SOA), business process management (BPM), and enterprise application integration (EAI). Boris is the coauthor of Applied SOA: Service-Oriented Architecture and Design Strategies, Professional Hadoop Solutions, Kubeflow for Machine Learning : From lab to production, and Serving Machine Learning Models. He is also cofounder of and frequent speaker at several Chicago user groups.
Lightbend, VP of Rocket Surgery
Dean Wampler, Ph.D., is the VP of Fast Data Engineering at Lightbend. He leads the development of Lightbend Fast Data Platform, a distribution of scalable, distributed stream processing tools including Spark, Flink, Kafka, and Akka, with machine learning and management tools. Dean is the author of several books, a frequent conference speaker and organizer, and he helps run several Meetups in Chicago.
View Workshops Workshop Tickets