Cannot get my stateful service to run. Pods can't get scheduled onto nodes

5/2/2019

I have been trying for a while to get a stateful service to start on my kubernetes cluster. The cluster has one master and one worker. It's running on top of AWS EC2 instances running with Ubuntu 18.04.

I've tried everything that I can think of but when I create the stateful service, the pods won't get scheduled onto the nodes. I believe that it has something to do with the PV's, but I can't figure out what. Also, I'm having a hard time getting any diagnostics. Trying to run kubectl logs on the pod and container returns nothing.

I first tried using local hardware, i.e. a local mount, but that didn't fix the problem. I've now created an AWS EBS volume and have created a PV that references this.
The PV binds to it correctly, but I still can't get kubernetes to schedule the pods on the worker node.

Here are the .yaml config files that I'm using.

The first one creates the storageclass called 'fast'

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: fast
provisioner: kubernetes.io/aws-ebs
parameters:
  type: gp2
reclaimPolicy: Retain
mountOptions:
  - debug
volumeBindingMode: Immediate

Here is the yaml file that creates the PV.

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: fast
provisioner: kubernetes.io/aws-ebs
parameters:
  type: gp2
reclaimPolicy: Retain
mountOptions:
  - debug
volumeBindingMode: Immediate

Finally, here's the statefulset yaml file

apiVersion: apps/v1
kind: StatefulSet
metadata:
  namespace: lab4a
  name: apache-http
spec:
  selector:
    matchLabels:
      app: httpd
  serviceName: "httpd-service"
  replicas: 3
  template:
    metadata:
      namespace: lab4a
      labels:
        app: httpd
    spec:
      terminationGracePeriodSeconds: 10
      containers:
      - name: httpd
        image: httpd:latest
        ports:
        - containerPort: 80
          name: web
        volumeMounts:
        - name: www
          mountPath: /usr/local/apache2/htdocs
  volumeClaimTemplates:
  - metadata:
      name: web-pvc
      namespace: lab4a
    spec:
      accessModes: [ "ReadWriteMany" ]
      storageClassName: "fast"
      resources:
        requests:
          storage: 10Gi

kubectl get pv gives me:

NAME     CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM   STORAGECLASS   REASON   AGE
ebs-pv   10Gi       RWX            Retain           Available                                   31m

So it stands to reason, at least as far as I can tell, that the PV is ready to go. From my understanding, I don't need to supply a PV Claim manually as the volumeClaimTemplates section in the statefulset yaml file will do this dynamically.

kubectl get all -n lab4a gives me:

NAME        READY   STATUS    RESTARTS   AGE
pod/web-0   0/1     Pending   0          16m

NAME            TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
service/nginx   ClusterIP   None         <none>        80/TCP    16m

NAME                   READY   AGE
statefulset.apps/web   0/2     16m

when I run kubectl describe pod web-0 -n lab4a I get the following:

Name:               web-0
Namespace:          lab4a
Priority:           0
PriorityClassName:  <none>
Node:               <none>
Labels:             app=nginx
                    controller-revision-hash=web-b46f789c4
                    statefulset.kubernetes.io/pod-name=web-0
Annotations:        <none>
Status:             Pending
IP:                 
Controlled By:      StatefulSet/web
Containers:
  nginx:
    Image:        k8s.gcr.io/nginx-slim:0.8
    Port:         80/TCP
    Host Port:    0/TCP
    Environment:  <none>
    Mounts:
      /usr/share/nginx/html from www (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-mjclk (ro)
Conditions:
  Type           Status
  PodScheduled   False 
Volumes:
  www:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  www-web-0
    ReadOnly:   false
  default-token-mjclk:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-mjclk
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason            Age                 From               Message
  ----     ------            ----                ----               -------
  Warning  FailedScheduling  35s (x14 over 16m)  default-scheduler  0/2 nodes are available: 2 node(s) had taints that the pod didn't tolerate.

I have no idea what's failing, and I don't know what else to try to debug this problem. Is kubernetes failing to bind the persistent volume to the node? Or is it some other issue?

Any help appreciated. Thanks

-- redmage123
kubernetes

1 Answer

5/3/2019

(1) Your Storage

AWS EBS does not provide ReadWriteMany (see table in the docs https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes).

You can

  • Use ReadWriteOnce instead (proposed).
  • Set up an in-cluster NFS that hosts PVs that allow ReadWriteMany if you do have an actual need for this.

(2) Your Taints and Tolerations

Your Pod's tolerations look okay; can you provide insight on your nodes' taints? Did you fiddle around with kubectl taint ... before on this cluster? Is this a managed cluster or did you set it up on your own on AWS machines?

-- jbndlr
Source: StackOverflow