Some Important Configs

Kafka Producer Config

Now that you have a basic understanding of what Kafka and the Kafka Producer Configurations are, let’s discuss the various Kafka Producer Configurations in the Kafka ecosystem. Although there are tons of configurations, the mandatory Kafka Producer Configurations that one needs to know in order to get started are very limited.

acks

Acks represent the number of acknowledgments that the producer needs the leader broker to have received before considering a successful commit. This helps to control the durability of messages that are sent. The following are the common settings for the acks Kafka Producer Config:

  • acks=0: Setting acks to 0 means the producer will not get any acknowledgment from the server at all. This means that the record will be immediately added to the socket buffer and considered sent.

  • acks=1: This means that as long as the producer receives an acknowledgment from the leader broker, it would consider it as a successful commit.

  • acks=all: This means the producer will have to wait for acknowledgments from all the in-sync replicas of that topic before considering a successful commit. It gives the strongest available message durability.

bootstrap.servers

bootstrap.server represents a list of host/port pairs that are used for establishing the initial connection to the Kafka Cluster. The list need not contain the full set of servers as they are used just to establish the initial connection to identify full cluster membership. The list should be in the format given below:

host1:port1,host2:port2,....

retries

By default, the producer doesn’t resend records if a commit fails. However, the producer can be configured to resend messages “n” a number of times with retries=n. retries basically represent the maximum number of times the producer would retry if the commit fails. The default value is 0.

enable.idempotence

In simple terms, idempotence is the property of certain operations to be applied multiple times without changing the result. When turned on, a producer will make sure that just one copy of a record is being published to the stream. The default value isfalse, meaning a producer may write duplicate copies of a message to the stream. To turn idempotence on, use the below command.

enable.idempotent=true

max.in.flight.requests.per.connection

max.in.flight.requests.per.connection Kafka Producer Config represents the maximum number of unacknowledged requests that the client will send on a single connection before blocking. The default value is 5.

If retries are enabled, and max.in.flight.requests.per.connection is set greater than 1, there lies a risk of message re-ordering.

buffer.memory

buffer.memory represents the total bytes of memory that the producer can use to buffer records waiting to be sent to the server. The default buffer.memory is 32MB. If the producer sends the records faster than they can be delivered to the server, the buffer.memory will be exceeded and the producer will block them for max.block.ms (discussed next), henceforth it will throw an exception. The buffer.memory setting should roughly correspond to the total memory used by the producer.

max.block.ms

max.block.ms basically defines the maximum duration for which the producer will block KafkaProducer.send() and KafkaProducer.partitionsFor(). These methods can be blocked whenever the buffer.memory is exceeded or when the metadata is unavailable.

linger.ms

linger.ms represents the artificial delay time before the batched request of records is ready to be sent. Any records that come in between request transmissions are batched together into a single request by the producer. linger.ms signifies the upper bound on the delay for batching. The default value is 0 which means there will be no delay and the batches will be immediately sent (even if there is only 1 message in the batch).

In some circumstances, the client may increase linger.ms to reduce the number of requests even under moderate load to improve throughput. But this way, more records will be stored in the memory.

batch.size

Whenever multiple records are sent to the same partition, the producer attempts to batch the records together. This way, the performance of both the client and the server can be improved. batch.size represents the maximum size (in bytes) of a single batch.

Small batch size will make batching irrelevant and will reduce throughput, and a very large batch size will lead to memory wastage as a buffer is usually allocated in anticipation of extra records.

compression.type

compression.type signifies the compression type for all data generated by the producer. The default value is none which means there is no compression. You can further set the compression.type to gzip, snappy, or lz4.


Last updated

Was this helpful?