Kafka architecture: performance

About speaker

An engineer in essence. Spent 5 years in the search quality department at Yandex, 2 years at Avast! as a software engineer, designed several systems for data processing at scale. His experience is mainly connected with web data and network applications. Built a Frontera, web crawling framework for large scale. Currently working as a technical architect at Zyte.

About speakers's company

Zyte (formerly Scrapinghub) is a company that specializes in data extraction and web scraping services. It provides tools for companies and individuals to find data for business intelligence such as product, pricing, competitive research, news, and content monitoring.

Abstracts

Kafka is a distributed messaging system capable of delivering high performance. In my talk I’ll explain the architecture of a broker and client parts, putting emphasis on design concepts enabling high performance. It will be useful for system design and overall understanding of Kafka.

There is not much information available on the net on Apache Kafka architecture, and I would like to fill this gap. In this talk, I’m going to roughly explain the design of distributed messaging in Kafka. To achieve the state of art IO performance, Kafka is making use of page writeback strategy, zero-copy, along with other optimisations on the client and server sides. To my understanding, these optimisations were the main driver of the design of the whole system, protocol, and client APIs. This talk will focus on the concepts responsible for high bandwidth, and will leave out topics of distributed agents coordination, cluster discovery, etc.

This talk will be useful to architects and system programmers for training system design skills and improving overall understanding of Kafka, as well as to those who are using or considering using Apache Kafka.

The main things we are going to discuss are:
- Which architectural decision is making Kafka performant?
- Meaning of efficiency for Kafka
- How to benchmark Kafka
- Theoretical throughput limit
- Large diagram of a message write path
- Overview and design of Kafka Producer client
- Linux writeback
- Read path diagram
- Kafka Consumer client design
- Summary: what makes Kafka efficient

The Program Committee has not yet taken a decision on this talk