Author : MD TAREQ HASSAN | Updated : 2021/04/06

Topic

Partition

Each topic is divided into partions (i.e. 3 partitions) and then each partion will be replicated

Partitions allow you to parallelize a topic by splitting the data in a particular topic across multiple brokers — each partition can be placed on a separate machine to allow for multiple consumers to read from a topic in parallel. Consumers can also be parallelized so that multiple consumers can read from multiple partitions in a topic allowing for very high message processing throughput.

Offset

Another way to view a partition is as a log. A data source writes messages to the log and one or more consumers reads from the log at the point in time they choose.

Hierarchy of topic, partion and offset

Kafka - hierarchy of topic, partion and offset

Data Log

Kafka retains messages for a configurable period of time and it is up to the consumers to adjust their behaviour accordingly. For instance, if Kafka is configured to keep messages for a day and a consumer is down for a period of longer than a day, the consumer will lose messages. However, if the consumer is down for an hour it can begin to read messages again starting from its last known offset. From the point of view of Kafka, it keeps no state on what the consumers are reading from a topic.

Broker

What broker does?

Leader Partition

In Sync Replicas

Producer

Acknowledgment

  1. wait for all in sync replicas to acknowledge the message (acks=all)
  2. wait for only the leader to acknowledge the message (acks=1) -> this is default
  3. do not wait for acknowledgement (acks=0)

Writing to partition

Message with key

Consumer

Providing consistency as a consumer

  1. receive each message at most once
  2. receive each message at least once
    • usually preffered
    • make sure yous consumer client is idempotent otherwise duplication will happen
  3. receive each message exactly once. Each of these scenarios deserves a discussion of its own

Consumer Group

Bootstrap Server

Zookeeper

Kafka in a Nutshell

Picture of 'Kafka in a Nutshell'

Core APIs

Kafka Connect

Kafka Connect is a free, open-source component of Apache Kafka that works as a centralized data hub for simple data integration between databases, key-value stores, search indexes, and file systems.

The benefits of Kafka Connect include:

Connector

Kafka Streams

Kafka Component Architecture

Strimzi deployment of Kafka: https://strimzi.io/docs/operators/latest/overview.html#kafka-concepts-components_str