CephFS mount. Can't read superblock

1/4/2019

Any pointers for this issue? Tried tons of things already to no avail.

This command fails with the error Can't read superblock

sudo mount -t ceph worker2:6789:/ /mnt/mycephfs -o name=admin,secret=AQAYjCpcAAAAABAAxs1mrh6nnx+0+1VUqW2p9A==


Some more info that may be helpful

uname -a Linux cephfs-test-admin-1 4.14.84-coreos #1 SMP Sat Dec 15 22:39:45 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

Ceph status and ceph osd status all show no issues at all

dmesg | tail
[228343.304863] libceph: resolve 'worker2' (ret=0): 10.1.96.4:0
[228343.322279] libceph: mon0 10.1.96.4:6789 session established
[228343.323622] libceph: client107238 fsid 762e6263-a95c-40da-9813-9df4fef12f53


ceph -s
  cluster:
    id:     762e6263-a95c-40da-9813-9df4fef12f53
    health: HEALTH_WARN
            too few PGs per OSD (16 < min 30)
  services:
    mon: 3 daemons, quorum worker2,worker0,worker1
    mgr: worker1(active)
    mds: cephfs-1/1/1 up  {0=mds-ceph-mds-85b4fbb478-c6jzv=up:active}
    osd: 3 osds: 3 up, 3 in
  data:
    pools:   2 pools, 16 pgs
    objects: 21 objects, 2246 bytes
    usage:   342 MB used, 76417 MB / 76759 MB avail
    pgs:     16 active+clean

ceph osd status
+----+---------+-------+-------+--------+---------+--------+---------+-----------+
| id |   host  |  used | avail | wr ops | wr data | rd ops | rd data |   state   |
+----+---------+-------+-------+--------+---------+--------+---------+-----------+
| 0  | worker2 |  114M | 24.8G |    0   |     0   |    0   |     0   | exists,up |
| 1  | worker0 |  114M | 24.8G |    0   |     0   |    0   |     0   | exists,up |
| 2  | worker1 |  114M | 24.8G |    0   |     0   |    0   |     0   | exists,up |
+----+---------+-------+-------+--------+---------+--------+---------+-----------+

ceph -v
ceph version 12.2.3 (2dab17a455c09584f2a85e6b10888337d1ec8949) luminous (stable)

Some of the syslog output:

Jan 04 21:24:04 worker2 kernel: libceph: resolve 'worker2' (ret=0): 10.1.96.4:0
Jan 04 21:24:04 worker2 kernel: libceph: mon0 10.1.96.4:6789 session established
Jan 04 21:24:04 worker2 kernel: libceph: client159594 fsid 762e6263-a95c-40da-9813-9df4fef12f53
Jan 04 21:24:10 worker2 systemd[1]: Started OpenSSH per-connection server daemon (58.242.83.28:36729).
Jan 04 21:24:11 worker2 sshd[12315]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=58.242.83.28  us>
Jan 04 21:24:14 worker2 sshd[12315]: Failed password for root from 58.242.83.28 port 36729 ssh2
Jan 04 21:24:15 worker2 sshd[12315]: Failed password for root from 58.242.83.28 port 36729 ssh2
Jan 04 21:24:18 worker2 sshd[12315]: Failed password for root from 58.242.83.28 port 36729 ssh2
Jan 04 21:24:18 worker2 sshd[12315]: Received disconnect from 58.242.83.28 port 36729:11:  [preauth]
Jan 04 21:24:18 worker2 sshd[12315]: Disconnected from authenticating user root 58.242.83.28 port 36729 [preauth]
Jan 04 21:24:18 worker2 sshd[12315]: PAM 2 more authentication failures; logname= uid=0 euid=0 tty=ssh ruser= rhost=58.242.83.28  user=root
Jan 04 21:24:56 worker2 systemd[1]: Started OpenSSH per-connection server daemon (24.114.79.151:58123).
Jan 04 21:24:56 worker2 sshd[12501]: Accepted publickey for core from 24.114.79.151 port 58123 ssh2: RSA SHA256:t4t9yXeR2yC7s9c37mdS/F7koUs2x>
Jan 04 21:24:56 worker2 sshd[12501]: pam_unix(sshd:session): session opened for user core by (uid=0)
Jan 04 21:24:56 worker2 systemd[1]: Failed to set up mount unit: Invalid argument
Jan 04 21:24:56 worker2 systemd[1]: Failed to set up mount unit: Invalid argument
Jan 04 21:24:56 worker2 systemd[1]: Failed to set up mount unit: Invalid argument
Jan 04 21:24:56 worker2 systemd[1]: Failed to set up mount unit: Invalid argument
Jan 04 21:24:56 worker2 systemd[1]: Failed to set up mount unit: Invalid argument
Jan 04 21:24:56 worker2 systemd[1]: Failed to set up mount unit: Invalid argument
Jan 04 21:24:56 worker2 systemd[1]: Failed to set up mount unit: Invalid argument
Jan 04 21:24:56 worker2 systemd[1]: Failed to set up mount unit: Invalid argument
Jan 04 21:24:56 worker2 systemd[1]: Failed to set up mount unit: Invalid argument
Jan 04 21:24:56 worker2 systemd[1]: Failed to set up mount unit: Invalid argument
Jan 04 21:24:56 worker2 systemd[1]: Failed to set up mount unit: Invalid argument
-- Serg
ceph
kubernetes

1 Answer

1/12/2019

So after digging the problem was due to XFS partitioning issues ...

Do not know how I missed it at first.

In short: Trying to create a partion using xfs was failing. i.e. running mkfs.xfs /dev/vdb1 would simply hang. The OS would still create and mark partitions properly but they'd be corrupt - the fact you'd only find out when trying to mount by getting that Can't read superblock error.

So ceph does this: 1. Run deploy 2. Create XFS partitions mkfs.xfs ... 3. OS would create those faulty partitions 4. Since you can still read the status of OSDs just fine all status report and logs will report no problems (mkfs.xfs did not report errors it just hang) 5. When you try to mount cephFS or use block storage the whole thing bombs due to corrupt partions.

The root cause: still unknown. But I suspect something was not done right on the SSD disk level when provisioning/attaching them from my cloud provider. It now works fine

-- Serg
Source: StackOverflow