Okay so I'm using a modified version of this repo: https://github.com/CaptTofu/mysql_replication_kubernetes/tree/master/galera_sync_replication
modified files are:
service:
apiVersion: v1
kind: Service
metadata:
name: ro-db
labels:
unit: pxc-cluster
spec:
ports:
- port: 3306
name: mysql
selector:
unit: pxc-cluster
pxc1, its the same replication controller, service for discovery and persistent volume claim for 2 and 3, just changing the numbers
apiVersion: v1
kind: Service
metadata:
name: pxc-node1
labels:
node: pxc-node1
spec:
ports:
- port: 3306
name: mysql
- port: 4444
name: state-snapshot-transfer
- port: 4567
name: replication-traffic
- port: 4568
name: incremental-state-transfer
selector:
node: pxc-node1
---
apiVersion: v1
kind: ReplicationController
metadata:
name: pxc-node1
spec:
replicas: 1
template:
metadata:
labels:
node: pxc-node1
unit: pxc-cluster
spec:
nodeSelector:
number: '1'
containers:
- image: capttofu/percona_xtradb_cluster_5_6:beta
name: pxc-node1
ports:
- containerPort: 3306
- containerPort: 4444
- containerPort: 4567
- containerPort: 4568
env:
- name: GALERA_CLUSTER
value: "true"
- name: WRSEP_ON
value: "true"
- name: WSREP_CLUSTER_ADDRESS
value: gcomm://
- name: WSREP_SST_USER
value: sst
- name: WSREP_SST_PASSWORD
value: sst
- name: MYSQL_USER
value: mysql
- name: MYSQL_PASSWORD
value: mysql
- name: MYSQL_ROOT_PASSWORD
value: c-krit
volumeMounts:
- name: mysql-persistent-storage-1
mountPath: /var/lib
securityContext:
capabilities: {}
privileged: true #privileged required for mount
volumes:
- name: mysql-persistent-storage-1
persistentVolumeClaim:
claimName: claim-galera-1
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: claim-galera-1
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 4Gi
selector:
matchLabels:
name: pxc1
Thing is it was working a few days ago and made a lot of testing bringing down pods, nodes and look how replication voting and everything was working, now when I'm integrating to the app it just won't start and I can't understand why if its the same configuration that was working, I've looked a lot over the internet, SO, GitHub and tried the fixes suggested but won't work.
2018-10-23 20:36:46 1 [Note] WSREP: (4be59ce1, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://10.244.2.61:4567
2018-10-23 20:36:47 1 [Note] WSREP: forgetting 49c4d2cf (tcp://10.244.2.61:4567)
2018-10-23 20:36:47 1 [Note] WSREP: (4be59ce1, 'tcp://0.0.0.0:4567') turning message relay requesting off
2018-10-23 20:36:47 1 [Warning] WSREP: no nodes coming from prim view, prim not possible
2018-10-23 20:36:47 1 [Note] WSREP: view(view_id(NON_PRIM,4be59ce1,5) memb {
4be59ce1,0
} joined {
} left {
} partitioned {
47f2860c,0
49c4d2cf,0
})
2018-10-23 20:36:50 1 [Note] WSREP: view((empty))
2018-10-23 20:36:50 1 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
at gcomm/src/pc.cpp:connect():162
2018-10-23 20:36:50 1 [ERROR] WSREP: gcs/src/gcs_core.cpp:long int gcs_core_open(gcs_core_t*, const char*, const char*, bool)():206: Failed to open backend connection: -110 (Connection timed out)
2018-10-23 20:36:50 1 [ERROR] WSREP: gcs/src/gcs.cpp:long int gcs_open(gcs_conn_t*, const char*, const char*, bool)():1379: Failed to open channel 'galera_kubernetes' at 'gcomm://pxc-node2,pxc-node3': -110 (Connection timed out)
2018-10-23 20:36:50 1 [ERROR] WSREP: gcs connect failed: Connection timed out
2018-10-23 20:36:50 1 [ERROR] WSREP: wsrep::connect(gcomm://pxc-node2,pxc-node3) failed: 7
2018-10-23 20:36:50 1 [ERROR] Aborting
2018-10-23 20:36:50 1 [Note] WSREP: Service disconnected.
2018-10-23 20:36:51 1 [Note] WSREP: Some threads may fail to exit.
2018-10-23 20:36:51 1 [Note] Binlog end
2018-10-23 20:36:51 1 [Note] mysqld: Shutdown complete
any suggestions? its been a few hours now and just can't make it work
Percona XtraDB Cluster now has native support for Kubernetes. The PXC Operator went GA 1.0 several weeks ago. https://percona.com/doc/kubernetes-operator-for-pxc/index.html