I have been trying for a while to get a stateful service to start on my kubernetes cluster. The cluster has one master and one worker. It's running on top of AWS EC2 instances running with Ubuntu 18.04.
I've tried everything that I can think of but when I create the stateful service, the pods won't get scheduled onto the nodes. I believe that it has something to do with the PV's, but I can't figure out what. Also, I'm having a hard time getting any diagnostics. Trying to run kubectl logs on the pod and container returns nothing.
I first tried using local hardware, i.e. a local mount, but that didn't fix the problem. I've now created an AWS EBS volume and have created a PV that references this.
The PV binds to it correctly, but I still can't get kubernetes to schedule the pods on the worker node.
Here are the .yaml config files that I'm using.
The first one creates the storageclass called 'fast'
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: fast
provisioner: kubernetes.io/aws-ebs
parameters:
type: gp2
reclaimPolicy: Retain
mountOptions:
- debug
volumeBindingMode: Immediate
Here is the yaml file that creates the PV.
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: fast
provisioner: kubernetes.io/aws-ebs
parameters:
type: gp2
reclaimPolicy: Retain
mountOptions:
- debug
volumeBindingMode: Immediate
Finally, here's the statefulset yaml file
apiVersion: apps/v1
kind: StatefulSet
metadata:
namespace: lab4a
name: apache-http
spec:
selector:
matchLabels:
app: httpd
serviceName: "httpd-service"
replicas: 3
template:
metadata:
namespace: lab4a
labels:
app: httpd
spec:
terminationGracePeriodSeconds: 10
containers:
- name: httpd
image: httpd:latest
ports:
- containerPort: 80
name: web
volumeMounts:
- name: www
mountPath: /usr/local/apache2/htdocs
volumeClaimTemplates:
- metadata:
name: web-pvc
namespace: lab4a
spec:
accessModes: [ "ReadWriteMany" ]
storageClassName: "fast"
resources:
requests:
storage: 10Gi
kubectl get pv gives me:
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
ebs-pv 10Gi RWX Retain Available 31m
So it stands to reason, at least as far as I can tell, that the PV is ready to go. From my understanding, I don't need to supply a PV Claim manually as the volumeClaimTemplates section in the statefulset yaml file will do this dynamically.
kubectl get all -n lab4a gives me:
NAME READY STATUS RESTARTS AGE
pod/web-0 0/1 Pending 0 16m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/nginx ClusterIP None <none> 80/TCP 16m
NAME READY AGE
statefulset.apps/web 0/2 16m
when I run kubectl describe pod web-0 -n lab4a I get the following:
Name: web-0
Namespace: lab4a
Priority: 0
PriorityClassName: <none>
Node: <none>
Labels: app=nginx
controller-revision-hash=web-b46f789c4
statefulset.kubernetes.io/pod-name=web-0
Annotations: <none>
Status: Pending
IP:
Controlled By: StatefulSet/web
Containers:
nginx:
Image: k8s.gcr.io/nginx-slim:0.8
Port: 80/TCP
Host Port: 0/TCP
Environment: <none>
Mounts:
/usr/share/nginx/html from www (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-mjclk (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
www:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: www-web-0
ReadOnly: false
default-token-mjclk:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-mjclk
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 35s (x14 over 16m) default-scheduler 0/2 nodes are available: 2 node(s) had taints that the pod didn't tolerate.
I have no idea what's failing, and I don't know what else to try to debug this problem. Is kubernetes failing to bind the persistent volume to the node? Or is it some other issue?
Any help appreciated. Thanks
AWS EBS does not provide ReadWriteMany
(see table in the docs https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes).
You can
ReadWriteOnce
instead (proposed).ReadWriteMany
if you do have an actual need for this.Your Pod's tolerations look okay; can you provide insight on your nodes' taints? Did you fiddle around with kubectl taint ...
before on this cluster? Is this a managed cluster or did you set it up on your own on AWS machines?