What is Apache Kafka?
- A high throughput distributed messaging system (from multiple source to multiple target)
- Distributed, fault tolerant, high throughput pub-sub messaging system
- Kafka is a distributed streaming process platform
- Kafka is a Message broker (Kafka provides pub-sub model based on topic)
- Kafka is used for real-time streaming as Channel or mediator between source and target
- High performance (real-time) and horizontally scalable
- Created by LinkedIn, now Open Source Project mainly maintained by Confluent
- Kafka centralizes communication between producers of data and consumers of that data
- Kakfa is only used as a transportation mechanism
- Kafka is quickly becoming the backbone of many organization’s data pipelines
![kafka decoupling of data streams and systems](https://dm2304files.storage.live.com/y4mMeLkZQwwj0NVMBi1juLb0mR45xc_07d9Izi7zuBC0KjPK7mRMnOS2Wk_aNUkUXoLz8OkyRuy8reqqjzr3obvXwOEmDwhAMp6dULFgqxe61VRDHKZfpgvLtvsCC3vAvuIIve7ksj2afUBSeTZIeZ7ZjK0mxDQaxX5_RS6VNFTEAZmQjOddzxiUHGhVBE0yFSG?width=1189&height=595&cropmode=none)
Use cases
- Messaging System
- Activity Tracking
- Gather metrics from many different locations
- Application Logs gathering
- Stream processing (with the Kafka Streams API or Spark for example)
- De-coupling of system dependencies
- Integration with Spark, Flink, Storm, Hadoop, and many other Big Data technologies