What is Apache Kafka?
- A high throughput distributed messaging system (from multiple source to multiple target)
- Distributed, fault tolerant, high throughput pub-sub messaging system
- Kafka is a distributed streaming process platform
- Kafka is a Message broker (Kafka provides pub-sub model based on topic)
- Kafka is used for real-time streaming as Channel or mediator between source and target
- High performance (real-time) and horizontally scalable
- Created by LinkedIn, now Open Source Project mainly maintained by Confluent
- Kafka centralizes communication between producers of data and consumers of that data
- Kakfa is only used as a transportation mechanism
- Kafka is quickly becoming the backbone of many organization’s data pipelines
Use cases
- Messaging System
- Activity Tracking
- Gather metrics from many different locations
- Application Logs gathering
- Stream processing (with the Kafka Streams API or Spark for example)
- De-coupling of system dependencies
- Integration with Spark, Flink, Storm, Hadoop, and many other Big Data technologies