In this section you will learn Apache Kafka terminologies like Topics, Partitions and Offsets
Kafka topic is a particular stream of data. It is identified by a name and depends on the user’s choice . It is used to publish and subscribe the records/data under that topic. The Producer publishes data to that topic and consumer reads that data from the subscribed topic.
For example it is similar to a table in database. As in database you can have as many as tables, similarly you can have as many as topics you want.
Topics are split into partitions. These partitions are then separated in an order. So each partition will be ordered.
So within a topic there are partitions and within that partition there are records/data. Hence, while creating a topics, you can specify the number of partitions it can have.
Each message within partition gets an incremental id , called as offset.
There will be guaranteed order of offset values within the partition and not across the partition.
If the data once written to the partition, can never be changed. Data within partition is immutable.
Data within the partition remains for a limited period only.
Now let us see with the below diagram how data is allocated within the partition.
Once the Kafka topic is created and you have specified the number of partitions then the first message to the partition 0 will get the offset 0 and then the next message will have offset 1 and so on.
As you can see, all message in partition 0 will have incremental id called as offsets. This incremental id is infinite and unbounded.
Similarly for partition 1 has incremental if from 1 to 8 and partition 2 has 0 to 10.
It is not necessary that all partitions have same number of messages.
Data is assigned randomly to a partition unless the key is given.
Servers are called as kafka brokers where topics are stored.
Kafka cluster is composed of multiple brokers. After connecting with any kafka broker (bootstrap broker) then you will be able to connect to any broker.
Each Kafka broker is identified by an id. This id will be an integer.
Each Kafka broker will have certain topic partitions. A topic is spread with different partitions of different brokers.
As we know Apache Kafka is distributed so we can define the replication factor which help as fault tolerant. Fault tolerant means if any broker goes down then some other broker will act as lead for that partition of the topic and serve the data.
Topics should have a replication factor more than 1 . Usually it is 2 or 3.
Let us see the below diagram to explain further.
In this example we have taken Topic-A with 2 partitions and replication factor as 2.
As you can see that Topic-A with partition 0 is on Broker 1 and Topic-A with partition 1 is on Broker 2. Also the replica of partition 0 is on Broker 1.
Similarly, the replica of partition 1 is on Broker 3.
Therefore each partition has 2 copy on different Kafka Brokers.
Lets see what happens if we loose Broker 2.
As you can see Broker 1 and Broker 3 can still serve the data. So replication allowed us to ensure that the data should not be lost.
At any given time only one Kafka broker can be a leader for a given partition.
And only that leader can receive and serve messages for that partition.
Other brokers will only keep copy of the messages by synchronization.
You have learnt about Apache Kafka Topics, partitions, offsets and Brokers. How replication happens in Brokers and what is leader for a given partition.