Notes from this slideshare.
for each partition, kafka picks one broker as the 'leader'; partition replicas are spread over brokers
can audit kafka with custom topics and consumers; per-cluster consumers can read all messages out of the cluster and count all messages for a topic
tune the OS, the JVM, but not too much to be done on kafka itself, except concurrent processors
messages are committed when some number of in-sync replicas (a tunable parameter) for that partition have applied the data to their datalog; so you can trade latency for durability
consumers only ever see committed messages