Durable event streams with Kafka clustering
Description
https://kafka.apache.org/documentation/#intro_streaming
https://youtu.be/FKgi3n-FyNU
services:
zookeeper (state) https://zookeeper.apache.org/doc/current/zookeeperStarted.html
kafka
Prerequisites
configure hosts files or add to DNS:
install software
ZooKeeper
minimum 3 brokers 5 recommended for zookeeper.
Configure zookeeper
For each kafka host add a unique id matching the host number
Start each zookeeper instance in a terminal
Kafka cluster
Configure each kafka instance, the config is identical except for the broker id which must be unique, again match the hostname number.
Start the kafka instances
Topics
Create a replicated topic
Producer and consumer
Connect to 2 instances and start a producer and consumer. In the producer send a message line, this will be printed to the consumer.
## Production
Tuning
swapiness = 1
vm.dirty_background_ratio = 5 (from d 10) vm.dirty_ratio = 60 to 80 (from d 20)
checking > egrep “dirty | writeback” /proc/vmstat |
xfs best but ext4 option.
noatime for mounts
networking
net.core.wmem_default = 2097152 (128 to 2MB) net.core.rmem_default = 2097152 (128 to 2MB) net.core.rmem_max = “4096 65536 2048000” net.core.wmem_max = “4096 65536 2048000” net.ipv4.tcp_window_scaling = 1 #net.ipv4.tcp_max_syn_backlog = 1024 #net.core.netdev_max_backlog = 1000
Java - Garbage First (G1) collector.
Adjects to workload with consistent GC pause times. Example 64GB broker with 5GB JVM.
export JAVA_HOME=/usr/java/jdk1.8.0_51 export KAFKA_JVM_PERFORMANCE_OPTS=”-server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+DisableExplicitGC -Djava.awt.headless=true”
stream processing
https://github.com/robinhood/faust