I have a CircleCI workflow, where on any merge to the master branch it builds the code, creates Docker image, and runs helm upgrade
to update the latest build to the k8s cluster.
I am facing issues with helm where two merges happened on the master branch often and CircleCI tries to run two helm upgrades simultaneously and the helm starts behaving weirdly.
Many of the time releases struck in the pending-install
state and I have to manually rollback. Even after rollback, many orphan k8s objects were left behind and I need to delete them manually.
I read the helm code and found there is a mutex lock that will prevent the parallel release. I doubt since helm mutex does not maintain explicit lock (remote lock) and my CircleCI is running the helm upgrade in 2 different sessions (2 different shells) the helm is not aware of the release in progress and causing this issue.
I am not sure how to handle this use case or if anyone has faced this issue in the past where helm left the orphan objects behind (mainly, cronjobs
and ingress
)?
A workaround I can think of is to stop the parallel build by checking helm status before running the helm upgrade, which is not idle.
K8S version - 1.21
Helm version - 3.7.2
The problem you describe is mainly related to Helm. You rightly note that workaround can be found:
A workaround I can think of is to stop the parallel build by checking helm status before running the helm upgrade, which is not idle.
Overall, this feature has had a lot of bugs in the past, and it's very possible that it doesn't work as it should again.
This error can also often be caused by duplicate env variable keys and the solution is described in this question and this github issue.
Look also at this topic about parallel helm installs and use --concurrency=N
flag.
When it comes to this:
I have a 70:30 success: fail ratio. This is weird and hard to understand.
Try checking your logs to see what may have gone wrong. Maybe you're running out of resources (for example, memory)? I tried to recreate this problem, but for me the solution was to remove duplicate kyes. You can also report a bug on github.