I have the following common scenario and I'm not sure which google cloud tool will fit best. I am new to google cloud.
I have a process (the collector) which collects data from producers every N minutes and consolidates it in a database. The data is published but is transient in the sense that if the collector does not collect it within a few periods it is lost. The collector is a background process that runs 24/7 and has a terminal logging interface for diagnostics. At the moment the collector runs on a server/PC as a python script but I would like to move it to the cloud. However, I am unsure whether I need to deploy this script as a google app on Google App Engine or a docker container on the Container Engine or just run it on a Compute Engine Node.
EDIT:
I've done my research and I've deployed the script on Google App Engine. However, my understanding is that App Engine might run several instances of the app to scale per usage and it certainly has done that. However, I do not end up with duplicate entries in the DB which is what I would expect if I just happen to start several instances of the script on my laptop.
GAE-oriented answer.
The lack of duplicate DB entries could theoretically be caused by:
You can prevent multiple GAE instances executing in parallel by using basic scaling with a max_instances
config set to 1. From Scaling types and instance classes:
Basic Scaling
A service with basic scaling will create an instance when the application receives a request. The instance will be turned down when the app becomes idle. Basic scaling is ideal for work that is intermittent or driven by user activity.
and the Scaling row in the table:
Scaling
A service with basic scaling is configured by setting the maximum number of instances in the max_instances parameter of the basic_scaling setting. The number of live instances scales with the processing volume.
See also Scaling elements.
There are many ways to bell this cat.
Using kubenaties is obviously easy, however you do not necessary need to use container-engine(which may be overly pricy), just for this.
If you have a compute-instance, which contains a script listening to a pub-sub, you can horizontally scale it by creating an instance template and choose to automatically scale based on processor usage.
In whatever way you choose to do this, duplicity of records is more tied to the publisher, rather than the subscriber (your python script)
I would never use appengine for such a task, even though you easily can do it in GAE, one should try and use it only in a front-end kinda roles IMHO