We use helm to manage all our resources in a k8s cluster. Recently we had an incident where some k8s resources were modified outside of helm (see below for details on the root cause).
The end result is however, that we have some k8s resources in our cluster that do not match what is specified in the helm chart of the release.
Example:
We have a helm chart that contains a HorizontalPodAutoscaler
. If I do something like:
helm get myservice-release
I will see something like this:
---
# Source: myservice/charts/default-deployment/templates/default_deployment.yaml
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: myservice-autoscaler
labels:
app: myservice
spec:
minReplicas: 2
maxReplicas: 10
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myservice-deployment
metrics:
- type: Resource
resource:
name: cpu
targetAverageUtilization: 85
However, if I do:
kubectl get hpa myservice-autoscaler -o yaml
The spec.{max,min}Replicas
does not match the Chart:
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
annotations:
autoscaling.alpha.kubernetes.io/conditions: '{REDACTED}'
autoscaling.alpha.kubernetes.io/current-metrics: '{REDACTED}'
creationTimestamp: "{REDACTED}"
labels:
app: myservice
name: myservice-autoscaler
namespace: default
resourceVersion: "174526833"
selfLink: /apis/autoscaling/v1/namespaces/default/horizontalpodautoscalers/myservice-autoscaler
uid: {REDACTED}
spec:
maxReplicas: 1
minReplicas: 1
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myservice-deployment
targetCPUUtilizationPercentage: 85
status:
currentCPUUtilizationPercentage: 9
currentReplicas: 1
desiredReplicas: 1
lastScaleTime: "{REACTED}"
I suspect, there are more than this one occurrence of drift in the k8s resources.
EDIT:
For those of you interested, this was caused by two helm
charts managing the same resources (autoscaling) both setting different values.
This occurred because two helm releases that were meant for different namespaces ended up in the same and were updated with --force
.
You can check how many revisions for the current release are available and then fetch values from each revision.
Execute helm get values --revision int32 RELEASE_NAME
in order to estimate the differences.
Please let me know if that helped.
We figured out a way to do this in a scalable way. Note, that this solution requires Kubernetes 1.13 for support of kubectl diff
.
The overall idea is to fetch the helm state and apply it using kubectl
to sync the two again. This might be unsafe on your cluster, please verify the changes with kubectl diff
.
Fetch the state from helm: helm get manifest {service}-release > {service}-release.yaml
Check if there is a difference to the k8s objects: kubectl diff -f {service}-release.yaml
Overwrite the k8s state with the helm state: kubectl apply -f {service}-release.yaml