Are there any recommendations on querying remote state stores between application instances that are deployed in Kubernetes? Our application instances are deployed with 2 or more replicas.
Based on documentation https://kafka.apache.org/10/documentation/streams/developer-guide/interactive-queries.html#id7
streams.allMetadataForStore("word-count")
.stream()
.map(streamsMetadata -> {
// Construct the (fictituous) full endpoint URL to query the current remote application instance
String url = "http://" + streamsMetadata.host() + ":" + streamsMetadata.port() + "/word-count/alice";
// Read and return the count for 'alice', if any.
return http.getLong(url);
})
.filter(s -> s != null)
.findFirst();
will streamsMetadata.host() result in the POD IP? And if it does, will the call from this pod to another be allowed? Is this the correct approach?
streamsMetadata.host()
This method returns whatever you configured via application.server
configuration parameter. I.e., each application instance (in your case each POD), must set this config to provide the information how it is reachable (e.g., its IP and port). Kafka Streams distributes this information for you to all application instances.
You also need to configure your PODs accordingly to allow sending/receiving query request via the specified port. This part is additional code you need to write yourself, i.e., some kind of "query routing layer". Kafka Streams has only built-in support to query local state and to distribute the metadata about which state is hosted where; but there is no built-in remove query support.
An example implementation (WordCountInteractiveQueries
) of a query routing layer can be found on Github: https://github.com/confluentinc/kafka-streams-examples
I would also recommend to checkout the docs and blog post: