The final solution employed AWS Redshift, the driving reason was the requirement of high speed data ingestion, which Redshift provides via the COPY command.
Hadoop is built to store the data efficiently, however it does not gurantees a sub-second sla for ingestion, neither does it provide an SLA for when the data will be available for MR jobs, this was the main reason we did not go with EMR or Hadoop in general.