I need help with an issue. I have installed storagesos cluster on top of kubernetes. I am able to create volumes. But when I create PVC and mount them to my K8 pod the pod is stuck in container creating state. From what I could gather from logs is :
Please help me I have been stuck for days. thanks ... Here are logs: https://imgur.com/a/VcggCJn
StorageOS Logs -
> kubectl logs storageos-dpj6q
time="2018-07-11T18:38:30Z" level=info msg="starting server" address=172.31.14.99 cluster= hostname=ip-172-31-14-99 id=4dd922fe-d998-c67a-122e-1ea755ec61e9 join=d732967d-93c4-4018-8e0e-a9170065e8e2 labels="map[]" module=command version="StorageOS 1.0.0 (f8915fa), built: 2018-05-25T190132Z"
time="2018-07-11T18:38:30Z" level=info msg="starting api server" action=create category=server endpoint="0.0.0.0:5705" module=cp
time="2018-07-11T18:38:30Z" level=info msg="starting embedded etcd server" action=create category=etcd cluster_id= initialised=true members= module=cp
time="2018-07-11T18:38:30Z" level=info msg="by using this product, you are agreeing to the terms of the StorageOS Ltd. End User Subscription Agreement (EUSA) found at: https://eusa.storageos.com" module=command
time="2018-07-11T18:38:30Z" level=info msg="started temporary docker volume plugin api while control plane starts"
time="2018-07-11T18:38:38Z" level=info msg="embedded etcd server started successfuly" action=create category=etcd module=cp
time="2018-07-11T18:38:38Z" level=info msg="new messaging routes found, [re]starting nats" action=update category=nats module=cp
time="2018-07-11T18:38:39Z" level=info msg="connected to store" action=wait address="http://127.0.0.1:5706" backend=embedded category=etcd module=cp
time="2018-07-11T18:38:39Z" level=info msg="temporary docker volume plugin api shutdown"
time="2018-07-11T18:38:39Z" level=info msg="leader election process started" action=election category=leader module=ha
time="2018-07-11T18:38:39Z" level=info msg="docker plugin api started, exporting filesystems at /var/lib/storageos/filesystems"
time="2018-07-11T18:38:39Z" level=error msg="startup force unmount failed" device_dir=/var/lib/storageos/volumes error="exit status 32" module=supervisor output="umount: /var/lib/storageos/volumes: not mounted"
time="2018-07-11T18:38:39Z" level=info msg="reaper started"
time="2018-07-11T18:38:39Z" level=info msg="dataplane notifications server starting..." path="unix:///var/run/storageos/dataplane-notifications.sock"
time="2018-07-11T18:38:39Z" level=info msg="syncer started"
time="2018-07-11T18:38:39Z" level=info msg="module=\"storageos-stats\", release=\"1.0.0\", buildId=\"release/1.0.0-rc1_5\", scmId=\"15e59658ac5bccb957b74cb96b2cf44dc175f3b2\" level=info" module=supervisor
time="2018-07-11T18:38:40Z" level=info msg="StorageOS Volume Presentation level=info" module=supervisor
time="2018-07-11T18:38:40Z" level=info msg="module=\"storageos-fs-director\", release=\"1.0.0\", buildId=\"release/1.0.0-rc1_5\", scmId=\"15e59658ac5bccb957b74cb96b2cf44dc175f3b2\" level=info" module=supervisor
time="2018-07-11T18:38:40Z" level=info msg="fuse: mountpoint is not empty" module=supervisor
time="2018-07-11T18:38:40Z" level=info msg="fuse: if you are sure this is safe, use the 'nonempty' mount option" module=supervisor
time="2018-07-11T18:38:40Z" level=info msg="Error creating FUSE channel! level=fatal" module=supervisor
time="2018-07-11T18:38:40Z" level=info msg="module=\"server\", release=\"1.0.0\", buildId=\"release/1.0.0-rc1_5\", scmId=\"15e59658ac5bccb957b74cb96b2cf44dc175f3b2\" level=info" module=supervisor
time="2018-07-11T18:38:40Z" level=info msg="module=\"client\", release=\"1.0.0\", buildId=\"release/1.0.0-rc1_5\", scmId=\"15e59658ac5bccb957b74cb96b2cf44dc175f3b2\" level=info" module=supervisor
time="2018-07-11T18:38:40Z" level=info msg="StorageOS DirectFS v1 server (server v0.1 protocol v1.3) start level=info" module=supervisor
time="2018-07-11T18:38:40Z" level=info msg="StorageOS DIRECTOR category=director level=info" category=director module=supervisor
time="2018-07-11T18:38:40Z" level=info msg="module=\"storageos-director\", release=\"1.0.0\", buildId=\"release/1.0.0-rc1_5\", scmId=\"15e59658ac5bccb957b74cb96b2cf44dc175f3b2\" level=info" category=director module=supervisor
time="2018-07-11T18:38:40Z" level=info msg="CACHE: configure ram of 256MB exceeds maximum recommended of 66 MiB category=corecache level=warn" category=corecache module=supervisor
time="2018-07-11T18:38:40Z" level=info msg="CACHE: configured cache ram value is 256MiB, using 66MiB category=corecache level=warn" category=corecache module=supervisor
time="2018-07-11T18:38:41Z" level=info msg="StorageOS DirectFS v1 client (server v0.1 protocol v1.3) start category=clinit level=info" category=clinit module=supervisor
time="2018-07-11T18:38:41Z" level=info msg="StorageOS RDB plugin category=rdbplginit level=info" category=rdbplginit module=supervisor
time="2018-07-11T18:38:41Z" level=info msg="=> dir: /var/lib/storageos/data category=rdbplginit level=info" category=rdbplginit module=supervisor
time="2018-07-11T18:38:41Z" level=info msg="=> databases: 1 category=rdbplginit level=info" category=rdbplginit module=supervisor
time="2018-07-11T18:38:41Z" level=info msg="=> nodes: 1 category=rdbplginit level=info" category=rdbplginit module=supervisor
time="2018-07-11T18:38:41Z" level=info msg="module=\"storageos-rdbplugin\", release=\"1.0.0\", buildId=\"release/1.0.0-rc1_5\", scmId=\"15e59658ac5bccb957b74cb96b2cf44dc175f3b2\" level=info" category=rdbplginit module=supervisor
time="2018-07-11T18:38:41Z" level=info msg="Initialising rdb databases category=rdbplginit level=info" category=rdbplginit module=supervisor
time="2018-07-11T18:38:42Z" level=info msg="rdb peristing db at /var/lib/storageos/data/db1 category=rdbplginit level=info" category=rdbplginit module=supervisor
time="2018-07-11T18:38:42Z" level=info msg="Ready category=rdbplginit level=info" category=rdbplginit module=supervisor
time="2018-07-11T18:38:44Z" level=info msg="started watching volumes" module=watcher
time="2018-07-11T18:38:44Z" level=info msg="started watching nodes" module=watcher
time="2018-07-11T18:38:44Z" level=info msg="starting dataplane state controller worker '3'" module=statesync
time="2018-07-11T18:38:44Z" level=info msg="starting dataplane state controller worker '0'" module=statesync
time="2018-07-11T18:38:44Z" level=info msg="starting dataplane state controller worker '1'" module=statesync
time="2018-07-11T18:38:44Z" level=info msg="starting dataplane state controller worker '2'" module=statesync
time="2018-07-11T18:38:44Z" level=info msg=started action=watch category=client key=storageos/nodes module=store
time="2018-07-11T18:38:44Z" level=info msg=started action=watch category=client key=storageos/volumes module=store
time="2018-07-11T18:38:44Z" level=info msg="starting api servers" action=start category=http module=cp
time="2018-07-11T18:38:44Z" level=info msg="server running" module=command
time="2018-07-11T18:38:44Z" level=error msg="watch cancelled" action=watch category=client error="etcdserver: mvcc: required revision has been compacted" key=storageos/volumes module=store
time="2018-07-11T18:38:44Z" level=warning msg="volume watcher: got received error 'watch stopped'" module=watcher
time="2018-07-11T18:38:46Z" level=info msg="config change: will add server map entry [72389@172.31.29.31:5703] level=info" category=rdbplginit module=supervisor
time="2018-07-11T18:38:46Z" level=info msg="configuration changes applied level=info" category=rdbplginit module=supervisor
time="2018-07-11T18:38:46Z" level=info msg="[72389@172.31.29.31:5703]:: attempt to establish channel category=clconmon level=info" category=clconmon module=supervisor
time="2018-07-11T18:38:46Z" level=info msg="[72389@172.31.29.31:5703]: connection established (fd=13) category=clconn level=info" category=clconn module=supervisor
time="2018-07-11T18:38:46Z" level=info msg="void FsConfig::presentation_lun_notify_receiver(FsConfig::FsConfigStore *, ConfigStore::Event<uint32_t, FsConfigPresentation>): Failed to stat backing store file '/var/lib/storageos/volumes/bst-197928' category=config level=warn" category=config module=supervisor
time="2018-07-11T18:38:46Z" level=info msg="void FsConfig::presentation_lun_notify_receiver(FsConfig::FsConfigStore *, ConfigStore::Event<uint32_t, FsConfigPresentation>): Failed to stat backing store file '/var/lib/storageos/volumes/bst-212792' category=config level=warn" category=config module=supervisor
time="2018-07-11T18:38:46Z" level=info msg="virtual bool FsConfig::PresentationEventSemantics::Validate(event_type): Not adding pr_filename '2560c04f-995b-2ad1-bc03-edd9df29b57c' for volume 56980 - already exists forvolume 56980 category=fscfg level=warn" category=fscfg module=supervisor
time="2018-07-11T18:38:46Z" level=info msg="validator 'device_validator' rejected Event{type CREATE} category=libcfg level=warn" category=libcfg module=supervisor
time="2018-07-11T18:38:46Z" level=error msg="filesystem client: presentation create failed" action=create error="<nil>" module=statesync reason="Create refused by validator" volume_uuid=2560c04f-995b-2ad1-bc03-edd9df29b57c
time="2018-07-11T18:38:46Z" level=info msg="virtual bool FsConfig::PresentationEventSemantics::Validate(event_type): Not adding pr_filename 'f536c52c-daf3-4d3f-ccba-2f4c1db4aea1' for volume 72389 - already exists forvolume 72389 category=fscfg level=warn" category=fscfg module=supervisor
time="2018-07-11T18:38:46Z" level=info msg="validator 'device_validator' rejected Event{type CREATE} category=libcfg level=warn" category=libcfg module=supervisor
time="2018-07-11T18:38:46Z" level=error msg="filesystem client: presentation create failed" action=create error="<nil>" module=statesync reason="Create refused by validator" volume_uuid=f536c52c-daf3-4d3f-ccba-2f4c1db4aea1
time="2018-07-11T18:38:47Z" level=info msg="Accepted connection from 172.31.29.31:65228 (fd=8) level=info" category=libcfg module=supervisor
time="2018-07-11T18:38:47Z" level=info msg="0x0000DE94@172.31.29.31:65228 connected level=info" category=libcfg module=supervisor
time="2018-07-11T19:04:36Z" level=error msg="got err while waiting for volume creation" action=create category=volume error="no available nodes for replica found, error: no node matched filtering constraints, filters: anti_affinity_filter" module=cp namespace=default volume=vol1
time="2018-07-11T19:04:58Z" level=error msg="got err while waiting for volume creation" action=create category=volume error="no node matched filtering constraints, filters: node_selector_filter" module=cp namespace=default volume=vol2
time="2018-07-11T19:05:33Z" level=error msg="got err while waiting for volume creation" action=create category=volume error="no node matched filtering constraints, filters: " module=cp namespace=default volume=vol3
PSQL Pod logs :
kubectl describe pod postgres-6569dcb44d-fj8k8
Name: postgres-6569dcb44d-fj8k8
Namespace: default
Node: ip-172-31-28-148/172.31.28.148
Start Time: Wed, 11 Jul 2018 18:39:52 +0000
Labels: pod-template-hash=2125876008
workload.user.cattle.io/workloadselector=deployment-default-postgres
Annotations: field.cattle.io/ports=[[{"containerPort":5432,"dnsName":"postgres-nodeport","kind":"NodePort","name":"5432tcp300321","protocol":"TCP","sourcePort":30040}]]
field.cattle.io/publicEndpoints=[{"addresses":["172.31.13.77"],"port":30032,"protocol":"TCP","serviceName":"default:postgres-nodeport","allNodes":true}]
Status: Pending
IP:
Controlled By: ReplicaSet/postgres-6569dcb44d
Containers:
postgres:
Container ID:
Image: postgres
Image ID:
Port: 5432/TCP
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/var/lib/postgresql/data from pgsql-data (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-r6kl9 (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
pgsql-data:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: pgsql-data
ReadOnly: false
default-token-r6kl9:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-r6kl9
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedMount 7m (x330 over 11h) kubelet, ip-172-31-28-148 Unable to mount volumes for pod "postgres-6569dcb44d-fj8k8_default(cc9e5947-8539-11e8-abcc-0ad368ad109e)": timeout expired waiting for volumes toattach or mount for pod "default"/"postgres-6569dcb44d-fj8k8". list of unmounted volumes=[pgsql-data]. list of unattached volumes=[pgsql-data default-token-r6kl9]
Warning FailedMount 2m (x344 over 11h) kubelet, ip-172-31-28-148 MountVolume.SetUp failed for volume "pvc-7a4ee536-851c-11e8-90d5-0ad368ad109e" : stat /var/lib/storageos/volumes/f536c52c-daf3-4d3f-ccba-2f4c1db4aea1: no such file or directory
> kubectl get sc
NAME PROVISIONER
fast kubernetes.io/storageos
> kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-4ac4fc7c-8471-11e8-90d5-0ad368ad109e 10Gi RWO Delete Released default/volx nfs-provisioner 1d
pvc-7a4ee536-851c-11e8-90d5-0ad368ad109e 20Gi RWO Delete Bound default/pgsql-data fast 15h
pvc-bc643fc6-8519-11e8-90d5-0ad368ad109e 10Gi RWO,RWX Delete Bound default/sos3 fast 15h
> kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
pgsql-data Bound pvc-7a4ee536-851c-11e8-90d5-0ad368ad109e 20Gi RWO fast 15h
sos3 Bound pvc-bc643fc6-8519-11e8-90d5-0ad368ad109e 10Gi RWO,RWX fast 15h
UPDATE 1: After setting the flags for k8 as pointed out by answers below.
I am still getting the same error while pod creation. An the pvc dir is not beinged creatd on the node/host VM( RHEL ). The logs are same as for above 'kubectl describe pod'
Node where Storageos pod is running :
[root@ip-172-31-24-214 storageos]# pwd
/var/lib/storageos
[root@ip-172-31-24-214 storageos]# ls -l
total 0
drwxr-xr-x. 3 root root 17 Jul 13 10:07 data
drwxr-xr-x. 2 root root 6 Jul 13 10:07 filesystems
drwxr-xr-x. 3 root root 22 Jul 13 10:07 kv
drwxr--r--. 2 root root 27 Jul 13 10:06 logs
drwxr-x---. 2 root root 16 Jul 13 10:06 state
drwxr-xr-x. 2 root root 6 Jul 13 10:07 volumes
[root@ip-172-31-24-214 storageos]# cd volumes/
[root@ip-172-31-24-214 volumes]# ls -l
total 0
Rancher, especially, when rke is used, deploys its config via tha cluster.yml file (see RKE docs for more details).
For adding that feature gate you have to adapt that cluster.yml in following way and rerun rke up --config cluster.yml
:
# Necessary for password protected SSH Keys:
ssh_agent_auth: true
# Nodes Definitions
nodes:
- address: 10.70.60.3 # hostname or IP to access nodes
user: docker # root user (usually 'root')
role: [controlplane,etcd,worker] # K8s roles for node
ssh_key_path: ~/.ssh/id_rsa # path to PEM file
...
services:
kube-apiserver:
extra_args:
feature-gates: "PersistentLocalVolumes=true,VolumeScheduling=true"
kubelet:
extra_args:
feature-gates: "PersistentLocalVolumes=true,VolumeScheduling=true"
# Default versions
system_images:
kubernetes: rancher/hyperkube:v1.10.5-rancher1
...
Hope, this helps.
Cheers, Damian
it looks like the MountPropagation is not enabled in your installation. Both docker and Kubernetes (k8s 1.10 enables that by default) need to be set up accordingly.
Can you post if there is any file and device in /var/lib/storageos/volumes for all your nodes?
Checkout the following doc link for the docker bit, point 2, and point 3 for the k8s feature gate. https://docs.storageos.com/docs/platforms/kubernetes/install/
You have to change the api-controller manifest to add the feature gate flag.
Kubernetes 1.10 has got that feature gate enabled by default. However, I don't know how Rancher deploys the config. If the kubelet is running in the system, you have to add the feature gate in the systemd kubelet.service file. If it runs in a container, you have to add the volume with shared mount flags in its config --volume=/var/lib/storageos:/var/lib/storageos:rshared. The helm chart has the variable cluster.sharedDir that you can define for containarised kubelet for which it will use the kubelet volume dir to host StorageOS' device files.