Zyte
About speaker
An engineer in essence. Spent 5 years in the search quality department at Yandex, 2 years at Avast! as a software engineer, designed several systems for data processing at scale. His experience is mainly connected with web data and network applications. Built a Frontera, web crawling framework for large scale. Currently working as a technical architect at Zyte.
About speakers's company
Zyte (formerly Scrapinghub) is a company that specializes in data extraction and web scraping services. It provides tools for companies and individuals to find data for business intelligence such as product, pricing, competitive research, news, and content monitoring.
Kafka is a distributed messaging system capable of delivering high performance. In my talk I’ll explain the architecture of a broker and client parts, putting emphasis on design concepts enabling high performance. It will be useful for system design and overall understanding of Kafka.
There is not much information available on the net on Apache Kafka architecture, and I would like to fill this gap. In this talk, I’m going to roughly explain the design of distributed messaging in Kafka. To achieve the state of art IO performance, Kafka is making use of page writeback strategy, zero-copy, along with other optimisations on the client and server sides. To my understanding, these optimisations were the main driver of the design of the whole system, protocol, and client APIs. This talk will focus on the concepts responsible for high bandwidth, and will leave out topics of distributed agents coordination, cluster discovery, etc.
This talk will be useful to architects and system programmers for training system design skills and improving overall understanding of Kafka, as well as to those who are using or considering using Apache Kafka.
The main things we are going to discuss are:
- Which architectural decision is making Kafka performant?
- Meaning of efficiency for Kafka
- How to benchmark Kafka
- Theoretical throughput limit
- Large diagram of a message write path
- Overview and design of Kafka Producer client
- Linux writeback
- Read path diagram
- Kafka Consumer client design
- Summary: what makes Kafka efficient
The Program Committee has not yet taken a decision on this talk
The largest professional conference for developers of high-load systems
Participation options
Offline
The price is soaring — the closer the conference is, the more it costs.
The current price of a ticket is — 63500 AMD
Online
All report flows (but not the meetup flows) will be broadcast online.
The current price of a ticket is — AMD
*for new orders. if you already have an application in your personal account - pay and come to us!
Changed your mind?
Tell us why.
Thank you for your reply!
Professional conference for developers of high-load systems
Offline
The price is soaring — the closer the conference is, the more it costs.
The current price of a ticket is 63 500 rsd
If you have any questions write to our support service: support@ontico.pro
Pre-order on HighLoad++ Serbia
Changed your mind?
Tell us why.
Thank you for your reply!
Congratulations!
Your ticket has been booked at the lowest price possible.
We'll send you an email when the ticket sale opens up.
In the meantime, let's hype you up with the recordings of the top 8 talks from HighLoad++ 2022.
https://www.youtube.com/playlist?list=PLq0OI1TAdxBZlIF4OSR8fhs7qJH2MkAi6