What is Apache Kafka? ( big picture) 

I found the article ( from Jay Kreps) presented a very good big picture on what Kafka suppose to do: you can use Kafka to build a stream data platform. Here the pictures from that article.

The big idea is simple: many business processes can be modeled as a  series of events (data/log stream) , thus we can publish those steam data ( topics) into Kafka,  the interested subscribers can receive the stream and do whatever they want ( online process, store the data, etc), then these application can publish the derived stream( result) back to Kafka.


A modern stream-centric data architecture built around Apache Kafka


Kakfa internal details


The producer/Publisher choose which topic/partition to write ( to leader broker which was elected using zookeeper see below)

The consumer pull/fetch data ( specific log position)  from leader broker . A partition is always consumed by a single consumer

  • “leader” is the node responsible for all reads and writes for the given partition. Each node will be the leader for a randomly selected portion of the partitions.
  • “replicas” is the list of nodes that replicate the log for this partition regardless of whether they are the leader or even if they are currently alive.
  • “isr” is the set of “in-sync” replicas. This is the subset of the replicas list that is currently alive and caught-up to the leader.


A good picture of zookeepr role here

Hands-on howto

see:, it is no pain at all to start and play with kafka ( on debian 8)

tar -xzf kafka_2.11-;
cd kafka_2.11-
start zookeepr and kafka
bin/ config/
bin/ config/
create a topic
bin/ --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
bin/ --list --zookeeper localhost:2181
produce a msg
bin/ --broker-list localhost:9092 --topic test
consume a msg
bin/ --zookeeper localhost:2181 --topic test --from-beginning



