Storm & Hadoop are complimentary!
Hadoop => big batch processing
Storm => fast, reactive, real time processing
Storm data model
-Spouts
->sources of data for the topology (e.g) Postgres/MySQL/Kafka/Kestrel
-Bolts
->units of computation on data (e.g) filtering/aggregation/join/transformations
Live stream of Tweets
tweet spout, parse tweet bolt, word count bolt
Stream grouping
shuffle, fields, all, global
tuble: immutable ordered list of elements
topology: directed acyclic graph, vertices = computation and edges = streams of data