I'm working on a project which uses kubernetes to manage a collection of flask servers and stores its data in redis. I have to run a lot of background tasks which handle and process data, and also check on the progress of that data processing. I'd like to know if there are frameworks or guides on how to do this optimally as my current setup leaves me feeling like it's suboptimal.
Here's basically how I have it set up now:
def process_data(data):
# do processing
return processed
def run_processor(data_key):
if redis_client.exists(f"{data_key}_processed", f"{data_key}_processing") > 0:
return
redis_client.set(f"{data_key}_processing", 1)
data = redis_client.get(data_key)
processed = process_data(data_key)
redis_client.set({f"{data_key}_processed": processed})
redis_client.delete(f"{data_key}_processing")
@app.route("start/data/processing/endpoint")
def handle_request():
Thread(target=run_processor, args=data_key).start()
return jsonify(successful=True)
The idea is that I can call the handle_request
endpoint as many times as I want and it will only run if the data is not processed and there isn't any other process already running, regardless of which pod is running it. One flaw I've already noticed is that the process could fail and leave f'{data_key}_processing'
in place. I could fix that by adding and refreshing a timeout, but it feels hacky to me. Additionally, I don't have a good way to "check in" on a process which is currently running.
If there are any useful resources or even just terms I could google the help would be much obliged.