bigdata platform with Kubernets or Hadoop

ByMin Wang

Oct 17, 2021

Hadoop:

Hadoop	kubernets
MapReduce	Spark on K8s Flink stream
HDFS	S3? any better one
Resource manager Yarn/Mesos	K8s itself

During its evolution phase, Hadoop provided three main functionalities that made it a Big Data-ready solution: a distributed computer mechanism (MapReduce), a robust data storage (HDFS), and a resource manager (YARN/Mesos). But modern technologies now provide a better replacement for each of these three components: Kubernetes as an efficient resource manager, Amazon S3 for data storage, and Spark and Flink as distributed computation solutions.

So do we need to use Hadoop as a distributed file system with Containers and Kubernetes? It really depends on application requirements and value proposition needs. Technically it’s feasible to run Hadoop with Docker and Kubernetes, however the entire ecosystem lacks smooth integration. Recent couple of open source projects try to solve this problem however if Hadoop will be a going forward solution or we need a new/different distributed file system platform only time will tell. Currently we have many solutions like Cloud storage platforms, Kafka, Elastic-search/logstash solves the storage scalability problem with their own strengths in specific areas while Hadoop and entire Hadoop ecosystem continue to be a dominant big data platform.

Documents

https://techgenix.com/kubernetes-hadoop-big-data/

By Min Wang

big data

bigdata platform with Kubernets or Hadoop

ByMin Wang

By Min Wang

Related Post

Amazon data services

kafka msg format, how to publish, read

column-oriented DB

You missed

troubleshooing missing ip in k8s ( metallb-system)

Q&A: Fine-Tuning and Guidance on diffusion models

coding judge system

what is std::forward and universal reference