Kubernetes persistent volume data corrupted after multiple pod deletions

8/7/2018

I am stuggling with a simple one replica deployment of the official event store image on a Kubernetes cluster. I am using a persistent volume for the data storage.

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: my-eventstore
spec:
  strategy:
    type: Recreate
  replicas: 1
  template:
    metadata:
      labels:
        app: my-eventstore
    spec:
      imagePullSecrets:
        - name: runner-gitlab-account
      containers:
        - name: eventstore
          image: eventstore/eventstore
      env:
        - name: EVENTSTORE_DB
          value: "/usr/data/eventstore/data"
        - name: EVENTSTORE_LOG
          value: "/usr/data/eventstore/log"
      ports:
        - containerPort: 2113
        - containerPort: 2114
        - containerPort: 1111
        - containerPort: 1112
      volumeMounts:
        - name: eventstore-storage
          mountPath: /usr/data/eventstore
  volumes:
    - name: eventstore-storage
      persistentVolumeClaim:
        claimName: eventstore-pv-claim

And this is the yaml for my persistent volume claim:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: eventstore-pv-claim
spec:
  accessModes:
    - ReadWriteOnce
 resources:
    requests:
      storage: 1Gi

The deployments work fine. It's when I tested for durability that I started to encounter a problem. I delete a pod to force actual state from desired state and see how Kubernetes reacts.

It immediately launched a new pod to replace the deleted one. And the admin UI was still showing the same data. But after deleting a pod for the second time, the new pod did not come up. I got an error message that said "record too large" that indicated corrupted data according to this discussion. https://groups.google.com/forum/#!topic/event-store/gUKLaxZj4gw

I tried again for a couple of times. Same result every time. After deleting the pod for the second time the data is corrupted. This has me worried that an actual failure will cause similar result.

However, when deploying new versions of the image or scaling the pods in the deployment to zero and back to one no data corruption occurs. After several tries everything is fine. Which is odd since that also completely replaces pods (I checked the pod id's and they changed).

This has me wondering if deleting a pod using kubectl delete is somehow more forcefull in the way that a pod is terminated. Do any of you have similar experience? Of insights on if/how delete is different? Thanks in advance for your input.

Regards,

Oskar

-- Oskar
get-event-store
kubectl
kubernetes

1 Answer

8/7/2018

I was refered to this pull request on Github that stated the the proces was not killed properly: https://github.com/EventStore/eventstore-docker/pull/52

After building a new image with the Docker file from the pull request put this image in the deployment. I am killing pods left and right, no data corruption issues anymore.

Hope this helps someone facing the same issue.

-- Oskar
Source: StackOverflow