Lessons learned form Kafka in production (Tim Berglund, Confluent)


    Many developers have already wrapped their minds around the basic architecture and APIs of Kafka as a message queue and a streaming platform. But can they keep it running in production? This talk contains real-world troubleshooting and optimization scenarios culled from the logs of Confluent technical support.
    We’ll talk about the trade-offs between optimizing for the always-desirable outcomes of throughput, latency, durability, and availability. How many partitions should you use for a given topic? How much message batching should you configure in the producer? How many replicas should be required to acknowledge a write? What do you do when you see a partition growing inexplicably? When should you rearchitect your application to use the streaming API? We’ll answer these questions and more int his overview of common Kafka production issues.


    Previous articleKubernetes – Manage TLS Certificates, CA, Certificate Signing Request CSR, Signers, Usage
    Next articleLuhut Binsar Panjaitan Berdebat Soal Big Data dengan Mahasiswa UI


    1. Now if only the Kafka community would actually put together some reasonable how-tos on setting up "clusters" (even single pods behind services) in Kubernetes with the correct listener configs. Pretty much all of the documentation is for Docker (Wurstmeister) or running an operator off of a helm chart (Bitnami). The former is useless now due to the vastly different networking model (and Docker being effectively dead), and the latter is ridiculous tooling overhead for a development environment like Minikube while also hiding away much of the low-level customization that is needed by most enterprises.

      The only one that actually works out of the box (and leaves reasonable breadcrumbs for modifying it) uses Google's out of date kafka on kubernetes image.

    2. so in summary:
      1. Don't hire administrators that they don't know how to operate Kafka and didn't remember that one of the brokers was upgraded to newer version than the rest of the cluster
      2. Poorly designed health check, again – poor administrator or whoever build that and not thoroughly test it. (moral of the story: Hire someone who know how to work with 'sharp knives')
      3. And again.. poor guys who runs this cluster didn't paying attention what they are doing.

    3. Q1: can a broker manage multiple partitions of same topic? In other words, can no. of broker be less than max no. of paritions in a topic?

      Q2: If answer to above is 'no' then can we say that in order to add new partition we need to add new broker first?

      Q3: What is best way to reduce no. of partitions? Should one just delete that extra partition, also will the broker handles deletion of its replica as well? OR is there a way to deactivate a partition I.e. set that extra partition to read only mode?