Galera mysql cluster fails to start in Kubernetes

10/24/2018

Okay so I'm using a modified version of this repo: https://github.com/CaptTofu/mysql_replication_kubernetes/tree/master/galera_sync_replication

modified files are:

service:

apiVersion: v1
kind: Service
metadata:
  name:  ro-db


  labels:
    unit: pxc-cluster
spec:
  ports:
    - port: 3306
      name: mysql
  selector:
    unit: pxc-cluster

pxc1, its the same replication controller, service for discovery and persistent volume claim for 2 and 3, just changing the numbers

apiVersion: v1
kind: Service
metadata:
  name: pxc-node1
  labels:
    node: pxc-node1
spec:
  ports:
    - port: 3306
      name: mysql
    - port: 4444
      name: state-snapshot-transfer
    - port: 4567
      name: replication-traffic
    - port: 4568
      name: incremental-state-transfer
  selector:
    node: pxc-node1
---
apiVersion: v1
kind: ReplicationController
metadata:
  name: pxc-node1
spec:
  replicas: 1
  template:
    metadata:
      labels:
        node: pxc-node1
        unit: pxc-cluster
    spec:  
      nodeSelector:
        number: '1'

      containers:
        - image: capttofu/percona_xtradb_cluster_5_6:beta
          name: pxc-node1
          ports:
            - containerPort: 3306
            - containerPort: 4444
            - containerPort: 4567
            - containerPort: 4568
          env:
            - name: GALERA_CLUSTER
              value: "true"
            - name: WRSEP_ON
              value: "true"
            - name: WSREP_CLUSTER_ADDRESS
              value: gcomm://
            - name: WSREP_SST_USER
              value: sst
            - name: WSREP_SST_PASSWORD
              value: sst
            - name: MYSQL_USER
              value: mysql
            - name: MYSQL_PASSWORD
              value: mysql
            - name: MYSQL_ROOT_PASSWORD
              value: c-krit 
          volumeMounts:
            - name: mysql-persistent-storage-1
              mountPath: /var/lib
          securityContext:
            capabilities: {}
            privileged: true #privileged required for mount
      volumes:
      - name: mysql-persistent-storage-1
        persistentVolumeClaim:
          claimName: claim-galera-1
---

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: claim-galera-1
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 4Gi
  selector:
    matchLabels:
      name: pxc1

Thing is it was working a few days ago and made a lot of testing bringing down pods, nodes and look how replication voting and everything was working, now when I'm integrating to the app it just won't start and I can't understand why if its the same configuration that was working, I've looked a lot over the internet, SO, GitHub and tried the fixes suggested but won't work.

2018-10-23 20:36:46 1 [Note] WSREP: (4be59ce1, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://10.244.2.61:4567 
2018-10-23 20:36:47 1 [Note] WSREP: forgetting 49c4d2cf (tcp://10.244.2.61:4567)
2018-10-23 20:36:47 1 [Note] WSREP: (4be59ce1, 'tcp://0.0.0.0:4567') turning message relay requesting off
2018-10-23 20:36:47 1 [Warning] WSREP: no nodes coming from prim view, prim not possible
2018-10-23 20:36:47 1 [Note] WSREP: view(view_id(NON_PRIM,4be59ce1,5) memb {
    4be59ce1,0
} joined {
} left {
} partitioned {
    47f2860c,0
    49c4d2cf,0
})
2018-10-23 20:36:50 1 [Note] WSREP: view((empty))
2018-10-23 20:36:50 1 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
     at gcomm/src/pc.cpp:connect():162
2018-10-23 20:36:50 1 [ERROR] WSREP: gcs/src/gcs_core.cpp:long int gcs_core_open(gcs_core_t*, const char*, const char*, bool)():206: Failed to open backend connection: -110 (Connection timed out)
2018-10-23 20:36:50 1 [ERROR] WSREP: gcs/src/gcs.cpp:long int gcs_open(gcs_conn_t*, const char*, const char*, bool)():1379: Failed to open channel 'galera_kubernetes' at 'gcomm://pxc-node2,pxc-node3': -110 (Connection timed out)
2018-10-23 20:36:50 1 [ERROR] WSREP: gcs connect failed: Connection timed out
2018-10-23 20:36:50 1 [ERROR] WSREP: wsrep::connect(gcomm://pxc-node2,pxc-node3) failed: 7
2018-10-23 20:36:50 1 [ERROR] Aborting

2018-10-23 20:36:50 1 [Note] WSREP: Service disconnected.
2018-10-23 20:36:51 1 [Note] WSREP: Some threads may fail to exit.
2018-10-23 20:36:51 1 [Note] Binlog end
2018-10-23 20:36:51 1 [Note] mysqld: Shutdown complete

any suggestions? its been a few hours now and just can't make it work

-- paltaa
docker
kubernetes
mysql
percona-xtradb-cluster

1 Answer

7/8/2019

Percona XtraDB Cluster now has native support for Kubernetes. The PXC Operator went GA 1.0 several weeks ago. https://percona.com/doc/kubernetes-operator-for-pxc/index.html

-- utdrmac
Source: StackOverflow