Pubsub data affinity with Pod autoscalling

7/25/2018

I am running a google cloud Kubernetes project with enabling pod autoscaling. The pods consume messages from a pubsub subscription (streaming data). Multiple users publish timestamped data packets to the above pubsub topic. But all published data packets have the same structure. In the actual scenario all the available pods consume data by all the users without restriction. Users are not bound to a specific pod.

What I want is to achieve the affinity here (Particular user's data should be processed by a specific pod. Please refer the actual scenario What I want to achieve images for further clarifications)

Could any one give a suggestion/comment about this achieving the data affinity here.

-- Sachith.Wanni
affinity
autoscaling
google-compute-engine
google-kubernetes-engine
kubernetes

1 Answer

8/1/2018

Basically, what you are trying to do is collect Pub/Sub topics from users over time, and then send those topics over to the pods.

I understand that when you say data affinity, you actually mean session affinity.

You may configure Session affinity with a Load Balancer. You've said that the type of traffic is Cloud Pub/Sub. This means that you cannot setup an HTTP Load Balancer. It has to be HTTPS traffic. Cloud Pub/sub traffic uses HTTPS traffic.

As a side note, the GKE ingress rule does not allow the use of session affinity yet.

You could use the GKE HTTPS Internal Load Balancer, or an external GCP HTTPS Load Balancer. This would direct the traffic to the specific pods.

-- Mahmoud Sharif
Source: StackOverflow