What is Apache Kafka?
Apache Kafka is an open source distributed streaming platform designed to handle the large volume of real time data between source and destination.
Apache Kafka acts as a broker between sender and receiver. It is also called a distributed publish-subscribe messaging system.
Apache Kafka was initially developed by LinkedIn, and later it was given to the Apache Software Foundation. Right now it is maintained by Confluent under Apache Software Foundation.
Following are some benefits of Kafka:
- Reliability − Kafka is distributed, replicated, partitioned and fault tolerant.
- Scalability − Kafka servers/brokers scales very easily without down time. It has horizontal scalability and can scale to 100s of brokers and can scale to millions of messages per second.
- Durability − Kafka uses Distributed mechanism which means messages persists on disk very fast and hence it is durable.
- Performance − Kafka has extremely high throughput for both publishing and subscribing messages. It has latency of less than 10s which can be called as real time processing.
What is a messaging system
A messaging system does exchange of messages between two or more devices.
There are two types of messaging system. One is point to point messaging system and second is publish-subscribe messaging system.
Apache Kafka is Publish-Subscribe messaging system.
A publish-subscribe messaging system allows the sender to send and write the message, and then receiver to read that particular message.
In Apache Kafka, a sender is called as a producer who publishes messages and a receiver is called as a consumer who consumes that particular message by subscribing it.
What is Streaming platform
Streaming process means the processing of data in the parallelly connected systems.
Streaming platform enables the task of the streaming process and parallel execution.
Streaming platform in Kafka has following capabilities:
- Process the stream of record as soon as they occur.
- Store streams of records in a fault-tolerant way.
- Works as a messaging system where it publishes – subscribes streams of records.
Companies Using Apache Kafka
It is used by more that 2000 companies and 40% of the Fortune 500.
Below are the prominent companies which are using Apache Kafka.
Why to Use Apache Kafka
Traditionally there is source system and target system between which the data is usually exchanged.
Then later on there are many source system and many target system between which data is exchanged. And this has drastically increased the complexity in the overall system communication architecture.
So problem with this architecture was :
- For example, if you have 5 source systems and 7 target systems then you have to have 35 integrations, which is highly complex.
- Each integration had difficulty of protocol i.e. how data is transported like HTTP, REST, TCP etc.
- Each integration had difficulty of data format i.e. how to parse the data like CSV, JSON, XML, Binary etc.
- Problem with data schema.
Therefore how to solve this problem. And here the Apache Kafka introduced to decouple the systems.
Apache Kafka Use Cases
There are many use cases of Apache Kafka. Some of them are mentioned below:
- As Messaging System
- Decoupling of dependent systems
- Activity Tracking
- Gathering of Application Logs
- As Stream Processing
- Integration with Hadoop, Spark, Flink, Storm and other Big Data technologies
- Gathering metrics from several locations
- Website activity monitoring
<<<<<<<<<<<<Image Of Use Cases>>>>>>>>>>>
Uber uses Kafka to collect user, cab and trip information in real time to calculate and forecast the demand.
LinkedIn Uses Kafka to prevent spamming and connections recommendations in real time.
Netflix uses Kafka to provide recommendations in real time while you watch any video.
Apache Kafka Core APIs
It has four core APIs:
Producer API – It allows an application to produce or publish a stream of records to one or more Kafka topics
Consumer API: It allows an application to consume or subscribe one or more topics and process the stream of records consumed by them.
Streams API: It allows an application to efficiently convert the input streams to the output streams. It allows an application to perform as a stream processor which consumes or subscribe an input stream from one or more topics and produce or publish an output stream to one or more topics.
Connector API: It permits building and running producers or consumers which connect Kafka topics to existing applications or data systems. For example, a connector to a database may record every change to a table.
We have well explained about what is apache kafka, its benefits , streaming platform, its use cases and Apache Kafka Core APIs.