I have a server running on Kubernetes to handle hourly processing jobs. Thinking of using a service to expose the pods, and using an (external) cron job to hit the load balancer so that kubernetes can autoscale to handle the higher load as required. However in implementation, if the cron job sends, say, 100 requests at the same time while there's only 1 pod, all the traffic will go to that pod whereas subsequently spun up pods will still not have any traffic to handle.
How can I get around this issue? Is it possible for me to scale up the pods first using a cron job before making the requests? Or should I make requests with a time delay so as to give time for the pods to get spun up? Or other suggestions are also welcome!
If you're looking for serverless-style instant scale-up, something like https://github.com/knative/ might be something you can use on top of Kubernetes/GKE.
Other than that, the only way to scale up pods on Kubernetes today is the Horizontal Pod Autoscaler, which will take a look at CPU/memory averages, (and if you're on GKE, it can use Custom Stackdriver Metrics you can expose from your app using Prometheus etc.).
I wrote a simple client go based application which you can pair with CronJob to scale up ir down the deployment. You can take inspiration from it and write it yourself or just use it. I hope this helps.