Loading…
This event has ended. Visit the official site or create your own event on Sched.
September 11-14, 2017 - Los Angeles, CA
Click Here For Information & Registration
Wednesday, September 13 • 4:00pm - 4:40pm
SMACK Stack and Beyond - Building Fast Data Pipelines - Jörg Schad & Matt Jarvis, Mesosphere

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.
Our world seems to move faster and faster and so are our requirements for data analytics. For many use cases such as fraud detection or reacting on sensor data the response times of traditional batch processing are simply to slow. In order to be able to react to such events close to real-time, we need to beyond the classical batch processing and utilize stream processing systems such as Apache Spark Streaming, Apache Flink, or Apache Storm.
But these systems are not sufficient by itself. For an efficient and fault-tolerant setup we also need to a message queue and storage system. One common example for such fast data pipelines is the SMACK stack which stands for
- Spark (Streaming) - the stream processing system
- Mesos - the cluster orchestrator
- Akka - the system for providing custom actors for reacting upon the analyses
- Cassandra - storage system
- Kafka - message queue

Setting up such pipeline in a scalable, efficient and fault-tolerant manner is not trivial.
This talk will first discuss several alternatives for the various parts in the stack, e.g., what are the tradeoffs between Spark Streaming and Apache Flink; when should I use ArangoDB or Apache Cassandra.
We will then discuss the challenges and best practices for setting up such pipelines in order.
The talk will finish with a demo of a fast data pipelines with Apache Flink, ArangoDB, and Apache Kafka deployed on DC/OS.

Speakers
avatar for Jörg Schad

Jörg Schad

CTO, ArangoDB
Jörg Schad is the CTO at ArangoDB. In a previous life, he has worked on or built machine learning pipelines in healthcare, distributed systems, including early Kubernetes code at Mesosphere, and in-memory databases. He received his Ph.D. for research about distributed databases and... Read More →



Wednesday September 13, 2017 4:00pm - 4:40pm PDT
Gold 1