Rancher , Kubernetes and StorageOS : Persistent Storage, volume mount issue?

7/11/2018

I need help with an issue. I have installed storagesos cluster on top of kubernetes. I am able to create volumes. But when I create PVC and mount them to my K8 pod the pod is stuck in container creating state. From what I could gather from logs is :

  • The volume dir/folder which is mapped from volume to host /var/lib/storageos/volumes/mypvcid is non existent
  • The storageos cluster containers are not able to create these dir/folders on the host VM ( RHEL on AWS).

Please help me I have been stuck for days. thanks ... Here are logs: https://imgur.com/a/VcggCJn


StorageOS Logs -

    > kubectl logs storageos-dpj6q
time="2018-07-11T18:38:30Z" level=info msg="starting server" address=172.31.14.99 cluster= hostname=ip-172-31-14-99 id=4dd922fe-d998-c67a-122e-1ea755ec61e9 join=d732967d-93c4-4018-8e0e-a9170065e8e2 labels="map[]" module=command version="StorageOS 1.0.0 (f8915fa), built: 2018-05-25T190132Z"
time="2018-07-11T18:38:30Z" level=info msg="starting api server" action=create category=server endpoint="0.0.0.0:5705" module=cp
time="2018-07-11T18:38:30Z" level=info msg="starting embedded etcd server" action=create category=etcd cluster_id= initialised=true members= module=cp
time="2018-07-11T18:38:30Z" level=info msg="by using this product, you are agreeing to the terms of the StorageOS Ltd. End User Subscription Agreement (EUSA) found at: https://eusa.storageos.com" module=command
time="2018-07-11T18:38:30Z" level=info msg="started temporary docker volume plugin api while control plane starts"
time="2018-07-11T18:38:38Z" level=info msg="embedded etcd server started successfuly" action=create category=etcd module=cp
time="2018-07-11T18:38:38Z" level=info msg="new messaging routes found, [re]starting nats" action=update category=nats module=cp
time="2018-07-11T18:38:39Z" level=info msg="connected to store" action=wait address="http://127.0.0.1:5706" backend=embedded category=etcd module=cp
time="2018-07-11T18:38:39Z" level=info msg="temporary docker volume plugin api shutdown"
time="2018-07-11T18:38:39Z" level=info msg="leader election process started" action=election category=leader module=ha
time="2018-07-11T18:38:39Z" level=info msg="docker plugin api started, exporting filesystems at /var/lib/storageos/filesystems"
time="2018-07-11T18:38:39Z" level=error msg="startup force unmount failed" device_dir=/var/lib/storageos/volumes error="exit status 32" module=supervisor output="umount: /var/lib/storageos/volumes: not mounted"
time="2018-07-11T18:38:39Z" level=info msg="reaper started"
time="2018-07-11T18:38:39Z" level=info msg="dataplane notifications server starting..." path="unix:///var/run/storageos/dataplane-notifications.sock"
time="2018-07-11T18:38:39Z" level=info msg="syncer started"
time="2018-07-11T18:38:39Z" level=info msg="module=\"storageos-stats\", release=\"1.0.0\", buildId=\"release/1.0.0-rc1_5\", scmId=\"15e59658ac5bccb957b74cb96b2cf44dc175f3b2\" level=info" module=supervisor
time="2018-07-11T18:38:40Z" level=info msg="StorageOS Volume Presentation level=info" module=supervisor
time="2018-07-11T18:38:40Z" level=info msg="module=\"storageos-fs-director\", release=\"1.0.0\", buildId=\"release/1.0.0-rc1_5\", scmId=\"15e59658ac5bccb957b74cb96b2cf44dc175f3b2\" level=info" module=supervisor
time="2018-07-11T18:38:40Z" level=info msg="fuse: mountpoint is not empty" module=supervisor
time="2018-07-11T18:38:40Z" level=info msg="fuse: if you are sure this is safe, use the 'nonempty' mount option" module=supervisor
time="2018-07-11T18:38:40Z" level=info msg="Error creating FUSE channel! level=fatal" module=supervisor
time="2018-07-11T18:38:40Z" level=info msg="module=\"server\", release=\"1.0.0\", buildId=\"release/1.0.0-rc1_5\", scmId=\"15e59658ac5bccb957b74cb96b2cf44dc175f3b2\" level=info" module=supervisor
time="2018-07-11T18:38:40Z" level=info msg="module=\"client\", release=\"1.0.0\", buildId=\"release/1.0.0-rc1_5\", scmId=\"15e59658ac5bccb957b74cb96b2cf44dc175f3b2\" level=info" module=supervisor
time="2018-07-11T18:38:40Z" level=info msg="StorageOS DirectFS v1 server (server v0.1 protocol v1.3) start level=info" module=supervisor
time="2018-07-11T18:38:40Z" level=info msg="StorageOS DIRECTOR category=director level=info" category=director module=supervisor
time="2018-07-11T18:38:40Z" level=info msg="module=\"storageos-director\", release=\"1.0.0\", buildId=\"release/1.0.0-rc1_5\", scmId=\"15e59658ac5bccb957b74cb96b2cf44dc175f3b2\" level=info" category=director module=supervisor
time="2018-07-11T18:38:40Z" level=info msg="CACHE: configure ram of 256MB exceeds maximum recommended of 66 MiB category=corecache level=warn" category=corecache module=supervisor
time="2018-07-11T18:38:40Z" level=info msg="CACHE: configured cache ram value is 256MiB, using 66MiB category=corecache level=warn" category=corecache module=supervisor
time="2018-07-11T18:38:41Z" level=info msg="StorageOS DirectFS v1 client (server v0.1 protocol v1.3) start category=clinit level=info" category=clinit module=supervisor
time="2018-07-11T18:38:41Z" level=info msg="StorageOS RDB plugin category=rdbplginit level=info" category=rdbplginit module=supervisor
time="2018-07-11T18:38:41Z" level=info msg="=> dir: /var/lib/storageos/data category=rdbplginit level=info" category=rdbplginit module=supervisor
time="2018-07-11T18:38:41Z" level=info msg="=> databases: 1 category=rdbplginit level=info" category=rdbplginit module=supervisor
time="2018-07-11T18:38:41Z" level=info msg="=> nodes: 1 category=rdbplginit level=info" category=rdbplginit module=supervisor
time="2018-07-11T18:38:41Z" level=info msg="module=\"storageos-rdbplugin\", release=\"1.0.0\", buildId=\"release/1.0.0-rc1_5\", scmId=\"15e59658ac5bccb957b74cb96b2cf44dc175f3b2\" level=info" category=rdbplginit module=supervisor
time="2018-07-11T18:38:41Z" level=info msg="Initialising rdb databases category=rdbplginit level=info" category=rdbplginit module=supervisor
time="2018-07-11T18:38:42Z" level=info msg="rdb peristing db at /var/lib/storageos/data/db1 category=rdbplginit level=info" category=rdbplginit module=supervisor
time="2018-07-11T18:38:42Z" level=info msg="Ready category=rdbplginit level=info" category=rdbplginit module=supervisor
time="2018-07-11T18:38:44Z" level=info msg="started watching volumes" module=watcher
time="2018-07-11T18:38:44Z" level=info msg="started watching nodes" module=watcher
time="2018-07-11T18:38:44Z" level=info msg="starting dataplane state controller worker '3'" module=statesync
time="2018-07-11T18:38:44Z" level=info msg="starting dataplane state controller worker '0'" module=statesync
time="2018-07-11T18:38:44Z" level=info msg="starting dataplane state controller worker '1'" module=statesync
time="2018-07-11T18:38:44Z" level=info msg="starting dataplane state controller worker '2'" module=statesync
time="2018-07-11T18:38:44Z" level=info msg=started action=watch category=client key=storageos/nodes module=store
time="2018-07-11T18:38:44Z" level=info msg=started action=watch category=client key=storageos/volumes module=store
time="2018-07-11T18:38:44Z" level=info msg="starting api servers" action=start category=http module=cp
time="2018-07-11T18:38:44Z" level=info msg="server running" module=command
time="2018-07-11T18:38:44Z" level=error msg="watch cancelled" action=watch category=client error="etcdserver: mvcc: required revision has been compacted" key=storageos/volumes module=store
time="2018-07-11T18:38:44Z" level=warning msg="volume watcher: got received error 'watch stopped'" module=watcher
time="2018-07-11T18:38:46Z" level=info msg="config change: will add server map entry [72389@172.31.29.31:5703] level=info" category=rdbplginit module=supervisor
time="2018-07-11T18:38:46Z" level=info msg="configuration changes applied level=info" category=rdbplginit module=supervisor
time="2018-07-11T18:38:46Z" level=info msg="[72389@172.31.29.31:5703]:: attempt to establish channel category=clconmon level=info" category=clconmon module=supervisor
time="2018-07-11T18:38:46Z" level=info msg="[72389@172.31.29.31:5703]: connection established (fd=13) category=clconn level=info" category=clconn module=supervisor
time="2018-07-11T18:38:46Z" level=info msg="void FsConfig::presentation_lun_notify_receiver(FsConfig::FsConfigStore *, ConfigStore::Event<uint32_t, FsConfigPresentation>): Failed to stat backing store file '/var/lib/storageos/volumes/bst-197928' category=config level=warn" category=config module=supervisor
time="2018-07-11T18:38:46Z" level=info msg="void FsConfig::presentation_lun_notify_receiver(FsConfig::FsConfigStore *, ConfigStore::Event<uint32_t, FsConfigPresentation>): Failed to stat backing store file '/var/lib/storageos/volumes/bst-212792' category=config level=warn" category=config module=supervisor
time="2018-07-11T18:38:46Z" level=info msg="virtual bool FsConfig::PresentationEventSemantics::Validate(event_type): Not adding pr_filename '2560c04f-995b-2ad1-bc03-edd9df29b57c' for volume 56980 - already exists forvolume 56980 category=fscfg level=warn" category=fscfg module=supervisor
time="2018-07-11T18:38:46Z" level=info msg="validator 'device_validator' rejected Event{type CREATE} category=libcfg level=warn" category=libcfg module=supervisor
time="2018-07-11T18:38:46Z" level=error msg="filesystem client: presentation create failed" action=create error="<nil>" module=statesync reason="Create refused by validator" volume_uuid=2560c04f-995b-2ad1-bc03-edd9df29b57c
time="2018-07-11T18:38:46Z" level=info msg="virtual bool FsConfig::PresentationEventSemantics::Validate(event_type): Not adding pr_filename 'f536c52c-daf3-4d3f-ccba-2f4c1db4aea1' for volume 72389 - already exists forvolume 72389 category=fscfg level=warn" category=fscfg module=supervisor
time="2018-07-11T18:38:46Z" level=info msg="validator 'device_validator' rejected Event{type CREATE} category=libcfg level=warn" category=libcfg module=supervisor
time="2018-07-11T18:38:46Z" level=error msg="filesystem client: presentation create failed" action=create error="<nil>" module=statesync reason="Create refused by validator" volume_uuid=f536c52c-daf3-4d3f-ccba-2f4c1db4aea1
time="2018-07-11T18:38:47Z" level=info msg="Accepted connection from 172.31.29.31:65228 (fd=8) level=info" category=libcfg module=supervisor
time="2018-07-11T18:38:47Z" level=info msg="0x0000DE94@172.31.29.31:65228 connected level=info" category=libcfg module=supervisor
time="2018-07-11T19:04:36Z" level=error msg="got err while waiting for volume creation" action=create category=volume error="no available nodes for replica found, error: no node matched filtering constraints, filters: anti_affinity_filter" module=cp namespace=default volume=vol1
time="2018-07-11T19:04:58Z" level=error msg="got err while waiting for volume creation" action=create category=volume error="no node matched filtering constraints, filters: node_selector_filter" module=cp namespace=default volume=vol2
time="2018-07-11T19:05:33Z" level=error msg="got err while waiting for volume creation" action=create category=volume error="no node matched filtering constraints, filters: " module=cp namespace=default volume=vol3

PSQL Pod logs :

kubectl describe pod postgres-6569dcb44d-fj8k8
Name:           postgres-6569dcb44d-fj8k8
Namespace:      default
Node:           ip-172-31-28-148/172.31.28.148
Start Time:     Wed, 11 Jul 2018 18:39:52 +0000
Labels:         pod-template-hash=2125876008
                workload.user.cattle.io/workloadselector=deployment-default-postgres
Annotations:    field.cattle.io/ports=[[{"containerPort":5432,"dnsName":"postgres-nodeport","kind":"NodePort","name":"5432tcp300321","protocol":"TCP","sourcePort":30040}]]
                field.cattle.io/publicEndpoints=[{"addresses":["172.31.13.77"],"port":30032,"protocol":"TCP","serviceName":"default:postgres-nodeport","allNodes":true}]
Status:         Pending
IP:
Controlled By:  ReplicaSet/postgres-6569dcb44d
Containers:
  postgres:
    Container ID:
    Image:          postgres
    Image ID:
    Port:           5432/TCP
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/lib/postgresql/data from pgsql-data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-r6kl9 (ro)
Conditions:
  Type           Status
  Initialized    True
  Ready          False
  PodScheduled   True
Volumes:
  pgsql-data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  pgsql-data
    ReadOnly:   false
  default-token-r6kl9:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-r6kl9
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason       Age                 From                       Message
  ----     ------       ----                ----                       -------
  Warning  FailedMount  7m (x330 over 11h)  kubelet, ip-172-31-28-148  Unable to mount volumes for pod "postgres-6569dcb44d-fj8k8_default(cc9e5947-8539-11e8-abcc-0ad368ad109e)": timeout expired waiting for volumes toattach or mount for pod "default"/"postgres-6569dcb44d-fj8k8". list of unmounted volumes=[pgsql-data]. list of unattached volumes=[pgsql-data default-token-r6kl9]
  Warning  FailedMount  2m (x344 over 11h)  kubelet, ip-172-31-28-148  MountVolume.SetUp failed for volume "pvc-7a4ee536-851c-11e8-90d5-0ad368ad109e" : stat /var/lib/storageos/volumes/f536c52c-daf3-4d3f-ccba-2f4c1db4aea1: no such file or directory

> kubectl get sc
NAME      PROVISIONER
fast      kubernetes.io/storageos
>  kubectl get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS     CLAIM                STORAGECLASS      REASON    AGE
pvc-4ac4fc7c-8471-11e8-90d5-0ad368ad109e   10Gi       RWO            Delete           Released   default/volx         nfs-provisioner             1d
pvc-7a4ee536-851c-11e8-90d5-0ad368ad109e   20Gi       RWO            Delete           Bound      default/pgsql-data   fast                        15h
pvc-bc643fc6-8519-11e8-90d5-0ad368ad109e   10Gi       RWO,RWX        Delete           Bound      default/sos3         fast                        15h
> kubectl get pvc
NAME         STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
pgsql-data   Bound     pvc-7a4ee536-851c-11e8-90d5-0ad368ad109e   20Gi       RWO            fast           15h
sos3         Bound     pvc-bc643fc6-8519-11e8-90d5-0ad368ad109e   10Gi       RWO,RWX        fast           15h

UPDATE 1: After setting the flags for k8 as pointed out by answers below.


I am still getting the same error while pod creation. An the pvc dir is not beinged creatd on the node/host VM( RHEL ). The logs are same as for above 'kubectl describe pod'

Node where Storageos pod is running :

[root@ip-172-31-24-214 storageos]# pwd
/var/lib/storageos

[root@ip-172-31-24-214 storageos]# ls -l
total 0
drwxr-xr-x. 3 root root 17 Jul 13 10:07 data
drwxr-xr-x. 2 root root  6 Jul 13 10:07 filesystems
drwxr-xr-x. 3 root root 22 Jul 13 10:07 kv
drwxr--r--. 2 root root 27 Jul 13 10:06 logs
drwxr-x---. 2 root root 16 Jul 13 10:06 state
drwxr-xr-x. 2 root root  6 Jul 13 10:07 volumes

[root@ip-172-31-24-214 storageos]# cd volumes/
[root@ip-172-31-24-214 volumes]# ls -l
total 0
-- damitj07
docker
kubernetes
rancheros

2 Answers

7/12/2018

Rancher, especially, when rke is used, deploys its config via tha cluster.yml file (see RKE docs for more details).

For adding that feature gate you have to adapt that cluster.yml in following way and rerun rke up --config cluster.yml:

# Necessary for password protected SSH Keys:
ssh_agent_auth: true

# Nodes Definitions
nodes:
  - address: 10.70.60.3  # hostname or IP to access nodes
    user: docker # root user (usually 'root')
    role: [controlplane,etcd,worker] # K8s roles for node
    ssh_key_path: ~/.ssh/id_rsa # path to PEM file
...

services:
  kube-apiserver:
    extra_args:
      feature-gates: "PersistentLocalVolumes=true,VolumeScheduling=true"
  kubelet:
    extra_args:
      feature-gates: "PersistentLocalVolumes=true,VolumeScheduling=true"

# Default versions
system_images:
    kubernetes: rancher/hyperkube:v1.10.5-rancher1
...

Hope, this helps.

Cheers, Damian

-- D. Lub
Source: StackOverflow

7/12/2018

it looks like the MountPropagation is not enabled in your installation. Both docker and Kubernetes (k8s 1.10 enables that by default) need to be set up accordingly.

Can you post if there is any file and device in /var/lib/storageos/volumes for all your nodes?

Checkout the following doc link for the docker bit, point 2, and point 3 for the k8s feature gate. https://docs.storageos.com/docs/platforms/kubernetes/install/

You have to change the api-controller manifest to add the feature gate flag.

Kubernetes 1.10 has got that feature gate enabled by default. However, I don't know how Rancher deploys the config. If the kubelet is running in the system, you have to add the feature gate in the systemd kubelet.service file. If it runs in a container, you have to add the volume with shared mount flags in its config --volume=/var/lib/storageos:/var/lib/storageos:rshared. The helm chart has the variable cluster.sharedDir that you can define for containarised kubelet for which it will use the kubelet volume dir to host StorageOS' device files.

-- Ferran Arau Castell
Source: StackOverflow