Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

I've been looking a way to reduce duplications or totally eliminate them and what I found is an interesting property

replica.high.watermark.checkpoint.interval.ms = 5000(default)
  

The frequency with which the high watermark is saved out to disk

and I was going through the random link which says,

replica.high.watermark.checkpoint.interval.ms property can affect throughput. Also, we can mark the last point where we read information while reading from a partition. In this way, we have a checkpoint from which to move forward without having to reread prior data, if we have to go back and locate the missing data. So, we will never lose a message, if we set the checkpoint watermark for every event.

First, So my question is how to use replica.high.watermark.checkpoint.interval.ms and

Second, is there any way to reduce duplicates using this property?

As far as I know, the high watermark indicates the last record that consumers can see, as it is the last record that has been fully replicated for that partition. This seems to indicate that it is used to prevent a consumer from consuming a record that is not yet fully replicated across all of its brokers, so that you don't consume something that could end up lost, leading to a bad state.

Changing the interval at which this would be updated does not seem like it would reduce duplication of messages. It would potentially have a slight performance impact (smaller interval = more disk writes) however.

For reducing duplication, I'd probably look at the Kafka exactly-once semantics introduced in 0.11.

Thanks for contributing an answer to Stack Overflow!

  • Please be sure to answer the question. Provide details and share your research!

But avoid

  • Asking for help, clarification, or responding to other answers.
  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.