Storm & Hadoop

Storm & Hadoop are complimentary!
Hadoop => big batch processing
Storm => fast, reactive, real time processing

Storm data model
-Spouts
->sources of data for the topology (e.g) Postgres/MySQL/Kafka/Kestrel
-Bolts
->units of computation on data (e.g) filtering/aggregation/join/transformations

Live stream of Tweets
tweet spout, parse tweet bolt, word count bolt

Stream grouping
shuffle, fields, all, global

tuble: immutable ordered list of elements
topology: directed acyclic graph, vertices = computation and edges = streams of data