Under the Hood of Apache Kafka: A Deep Dive into Kafka's Architecture
Discover the Core Components and Mechanisms that Make Apache Kafka a Powerhouse in Data Streaming
Apache Kafka is open source messaging platform that is developed to deliver performance, scalability, durability and availability.
Event Log
It is important to understand the concept of event log, because Apache Kafka is an implementation of event log.
Producer’s message are appended to the end of the log always.
Topic
Messages are stored in topic in ordered fashion.
Producers write messages into topic.
Consumers read messages from topic.
Internals of Topic
Each topic is sub-divided into multiple partitions across multiple brokers, thats how Kafka achieves performance and scalability.
Each partition has ordered messages with assigned sequence number to each message which is known as offset.
Consumers can read particular message using offset.
Messages remain as it is in topics for defined a retention period.
Kafka Internal Architecture
Components and Terminologies Involved
Brokers
A server node that handles all operations
Kafka Cluster
Group of brokers
Messages are distributed among brokers in round robin fashion.
Producers can also select a partition to send messages.
Leader Node
Broker node that will handle all write operations for a partition.
Followers
All Broker nodes that will have replicated data.
Thats how Kafka maintains fault tolerance.
All brokers that can handle read operations.
Zookeeper
Coordination system used in cluster to each node can talk to each other.
Replication Factor
It’s a configurable value at topic level, it defines the total number of replicas for this topic.
Replication factor with value 1, are not replicated.
In Sync Replicas
Other server nodes, that has replica of data same as leader has, so in case leader goes down one of the node from ISR can be elected as leader.
This way Kafka commits zero data loss.
so let’s say min.insync.replicas = 2 and acks = all
This means, a write will only be considered successful once the write update is done in two of the parititions.
Consumer Group
Consumer group assigned to a group of consumers, so a message will be delivered to only one of the consumer, not to all.
Assume in this situation we have replication factor = 2, min.insync.replicas = 2 and topic = order-events.
Assuming partition-2 is the leader.
Producer-1 writes a message into partition-2 of broker-1.
Same message will also be replicated to partition-1(broker-2) and partition-2 (broker-3) as these replicas are in sync.
So write will only be successful if value is replicated to partition-1 or partition-2 as we have insync replicas 2.
So lets a we have a consumer group X, that has two active consumers in it.
Any of the consumer can read this message from any of the three partitions (partition-2, partition-1, partition-3).
By this you must have a good understanding of how Kafka works internally.
If you really like my content you can subscribe me below.
Youtube Channel - https://www.youtube.com/channel/UCpF3Y8AxzgYZnI8Zcf_G_fg
You can follow me on linkedin here - https://www.linkedin.com/in/suchait-gaurav-944479109/
Github Repo - https://github.com/suchait007