We have Celery Beat set up using the following deployment.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: celery-beat
labels:
deployment: celery-beat
spec:
replicas: 1
minReadySeconds: 120
selector:
matchLabels:
app: celery-beat
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 0 # Would rather have downtime than an additional instance in service?
maxUnavailable: 1
template:
metadata:
labels:
app: celery-beat
spec:
containers:
- name: celery-beat
image: image_url
command: ["/var/app/scripts/kube_worker_beat_start.sh"]
imagePullPolicy: Always
ports:
- containerPort: 8000
name: http-server
livenessProbe: #From https://github.com/celery/celery/issues/4079#issuecomment-437415370
exec:
command:
- /bin/sh
- -c
- celery -A app_name status | grep ".*OK"
initialDelaySeconds: 3600
periodSeconds: 3600
readinessProbe:
exec:
command:
- /bin/sh
- -c
- celery -A app_name status | grep ".*OK"
initialDelaySeconds: 60
periodSeconds: 30
resources:
limits:
cpu: "0.5" #500mcores - only really required on install
requests:
cpu: "30m"
I have found the RollingUpdate
settings tricky because with Celery Beat you really don't want two instances, otherwise you might get duplicate tasks being completed. This is super important for us to avoid since we're using it to send out push notifications.
With the current settings, when a deployment rolls out there is 3-5mins of downtime, because the existing instance is terminated immediately and we have to wait for the new one to set itself up.
Is there a better way of configuring this reduce the downtime whilst ensuring a maximum of one service is ever in service?