Hadoop

The reason to use big data is it’s too big to store in one machine.Challenges with big data is data is created fast and data from different source in various formats.

Hadoop
Store in HDFS
process with MAPREDUCE

Hadoop ecosystem
pig, hive … select * from
mapreduce, impala, hbase
HDFS <- sqoop, flume Hue, oozie, mahout Cloudera is a distribution of Hadoop(CDH) Hadoop picks three node as random.