The reason to use big data is it’s too big to store in one machine.Challenges with big data is data is created fast and data from different source in various formats.
Store in HDFS
process with MAPREDUCE
Hadoop ecosystem
pig, hive … select * from
mapreduce, impala, hbase
HDFS <- sqoop, flume
Hue, oozie, mahout
Cloudera is a distribution of Hadoop(CDH)
Hadoop picks three node as random.