Hello I have built kubernetes on rancher built with single docker and I want to install kubeflow additionally.
I have imported the yaml file and installed it with kfctl. But the problem is that although it is installed, it is installed incompletely and the main functions are not executed.
taeil-kubeflow# kubectl get all -n kubeflow
NAME READY STATUS RESTARTS AGE
pod/admission-webhook-bootstrap-stateful-set-0 1/1 Running 2 32m
pod/admission-webhook-deployment-5cd7dc96f5-j8ptn 1/1 Running 0 31m
pod/application-controller-stateful-set-0 1/1 Running 0 32m
pod/argo-ui-65df8c7c84-2bfhd 1/1 Running 0 31m
pod/cache-deployer-deployment-5f4979f45-kfhfg 1/2 CrashLoopBackOff 4 31m
pod/cache-server-7859fd67f5-9lmrt 0/2 Init:0/1 0 31m
pod/centraldashboard-67767584dc-flb2n 1/1 Running 0 31m
pod/jupyter-web-app-deployment-67fb955745-49vzj 1/1 Running 0 31m
pod/katib-controller-7fcc95676b-s9lwd 1/1 Running 1 31m
pod/katib-db-manager-85db457c64-s4q5p 0/1 Error 4 31m
pod/katib-mysql-6c7f7fb869-4228x 0/1 Pending 0 31m
pod/katib-ui-65dc4cf6f5-pxs8g 1/1 Running 0 31m
pod/kfserving-controller-manager-0 2/2 Running 0 31m
pod/kubeflow-pipelines-profile-controller-797fb44db9-vstfv 1/1 Running 0 31m
pod/metacontroller-0 1/1 Running 0 32m
pod/metadata-db-6dd978c5b-qwbnz 0/1 Pending 0 31m
pod/metadata-envoy-deployment-67bd5954c-dw6rx 1/1 Running 0 31m
pod/metadata-grpc-deployment-577c67c96f-872lf 0/1 CrashLoopBackOff 1 31m
pod/metadata-writer-756dbdd478-dwwpc 2/2 Running 2 31m
pod/minio-54d995c97b-md886 0/1 Pending 0 31m
pod/ml-pipeline-7c56db5db9-856mr 1/2 CrashLoopBackOff 14 31m
pod/ml-pipeline-persistenceagent-d984c9585-248b4 2/2 Running 0 31m
pod/ml-pipeline-scheduledworkflow-5ccf4c9fcc-mv4vs 2/2 Running 0 31m
pod/ml-pipeline-ui-7ddcd74489-qv2gp 2/2 Running 0 31m
pod/ml-pipeline-viewer-crd-56c68f6c85-7rf6f 2/2 Running 3 31m
pod/ml-pipeline-visualizationserver-5b9bd8f6bf-jj5st 2/2 Running 0 31m
pod/mpi-operator-d5bfb8489-nzl2k 1/1 Running 0 31m
pod/mxnet-operator-7576d697d6-z2dc8 1/1 Running 0 31m
pod/mysql-74f8f99bc8-rpxcc 0/2 Pending 0 31m
pod/notebook-controller-deployment-5bb6bdbd6d-dq5nl 1/1 Running 0 31m
pod/profiles-deployment-56bc5d7dcb-k5cph 2/2 Running 0 31m
pod/pytorch-operator-847c8d55d8-8f79m 1/1 Running 0 31m
pod/seldon-controller-manager-6bf8b45656-jd682 1/1 Running 0 31m
pod/spark-operatorsparkoperator-fdfbfd99-6mhst 1/1 Running 0 32m
pod/spartakus-volunteer-558f8bfd47-l67zg 1/1 Running 0 31m
pod/tf-job-operator-58477797f8-qg2tl 1/1 Running 0 31m
pod/workflow-controller-64fd7cffc5-md54d 1/1 Running 0 31m
As you can see, problems occur in various pods such as cache server, kativ-mysql, metadata-db, minio, mysql, metadata-grpc, ml-pipeline, etc.
I'm guessing it's a persistent volume problem, but I don't know how to solve it specifically.
Please help me
Add an additional describe and log for each pod.
pod : cache-server-7859fd67f5-9lmrt
status : init:0/1
describe:
Warning FailedMount 10m (x72 over 18h) kubelet Unable to attach or mount volumes: unmounted volumes=[webhook-tls-certs], unattached volumes=[istio-token kubeflow-pipelines-cachethe condition
Warning FailedMount 3m47s (x550 over 18h) kubelet MountVolume.SetUp failed for volume "webhook-tls-certs" : secret "webhook-server-tls" not found
log :
error: a container name must be specified for pod cache-server-7859fd67f5-9lmrt, choose one of: [server istio-proxy] or one of the init containers: [istio-init]
pod : katib-mysql-6c7f7fb869-4228x
status : Pending 0/1
describe :
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 18h default-scheduler 0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims.
pod : metadata-db-6dd978c5b-qwbnz
status : Pending 0/1
describe :
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 18h default-scheduler 0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims.
pod : minio-54d995c97b-md886
status : Pending 0/1
describe :
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 18h default-scheduler 0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims.
pod : mysql-74f8f99bc8-rpxcc
status : Pending 0/2
describe :
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 18h default-scheduler 0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims.
pod : cache-deployer-deployment-5f4979f45-kfhfg
status : crashloopbackoff 1/2
describe :
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Pulling 36m (x213 over 18h) kubelet Pulling image "gcr.io/ml-pipeline/cache-deployer:1.0.4"
Warning BackOff 68s (x4916 over 18h) kubelet Back-off restarting failed container
log :
error: a container name must be specified for pod cache-deployer-deployment-5f4979f45-kfhfg, choose one of: [main istio-proxy] or one of the init containers: [istio-init]
pod : katib-db-manager-85db457c64-s4q5p
status : crashloopbackoff 0/1
describe :
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning BackOff 7m16s (x4260 over 18h) kubelet Back-off restarting failed container
Warning Unhealthy 2m13s (x1190 over 18h) kubelet Readiness probe failed:
log :
E0622 02:00:41.686168 1 mysql.go:62] Ping to Katib db failed: dial tcp 10.43.25.138:3306: connect: connection refused
E0622 02:00:46.674159 1 mysql.go:62] Ping to Katib db failed: dial tcp 10.43.25.138:3306: connect: connection refused
E0622 02:00:51.666117 1 mysql.go:62] Ping to Katib db failed: dial tcp 10.43.25.138:3306: connect: connection refused
E0622 02:00:56.690171 1 mysql.go:62] Ping to Katib db failed: dial tcp 10.43.25.138:3306: connect: connection refused
E0622 02:01:01.682194 1 mysql.go:62] Ping to Katib db failed: dial tcp 10.43.25.138:3306: connect: connection refused
E0622 02:01:06.674132 1 mysql.go:62] Ping to Katib db failed: dial tcp 10.43.25.138:3306: connect: connection refused
E0622 02:01:11.666146 1 mysql.go:62] Ping to Katib db failed: dial tcp 10.43.25.138:3306: connect: connection refused
E0622 02:01:16.690230 1 mysql.go:62] Ping to Katib db failed: dial tcp 10.43.25.138:3306: connect: connection refused
E0622 02:01:21.686129 1 mysql.go:62] Ping to Katib db failed: dial tcp 10.43.25.138:3306: connect: connection refused
E0622 02:01:26.674431 1 mysql.go:62] Ping to Katib db failed: dial tcp 10.43.25.138:3306: connect: connection refused
E0622 02:01:31.670133 1 mysql.go:62] Ping to Katib db failed: dial tcp 10.43.25.138:3306: connect: connection refused
E0622 02:01:36.690492 1 mysql.go:62] Ping to Katib db failed: dial tcp 10.43.25.138:3306: connect: connection refused
F0622 02:01:36.690581 1 main.go:83] Failed to open db connection: DB open failed: Timeout waiting for DB conn successfully opened.
goroutine 1 [running]:
github.com/kubeflow/katib/vendor/k8s.io/klog.stacks(0xc000216200, 0xc000230000, 0x89, 0xd0)
/go/src/github.com/kubeflow/katib/vendor/k8s.io/klog/klog.go:830 +0xb9
github.com/kubeflow/katib/vendor/k8s.io/klog.(*loggingT).output(0xc72b20, 0xc000000003, 0xc00022a000, 0xc14079, 0x7, 0x53, 0x0)
/go/src/github.com/kubeflow/katib/vendor/k8s.io/klog/klog.go:781 +0x2da
github.com/kubeflow/katib/vendor/k8s.io/klog.(*loggingT).printf(0xc72b20, 0x3, 0x92bfcd, 0x20, 0xc0001edf48, 0x1, 0x1)
/go/src/github.com/kubeflow/katib/vendor/k8s.io/klog/klog.go:678 +0x153
github.com/kubeflow/katib/vendor/k8s.io/klog.Fatalf(...)
/go/src/github.com/kubeflow/katib/vendor/k8s.io/klog/klog.go:1209
main.main()
/go/src/github.com/kubeflow/katib/cmd/db-manager/v1beta1/main.go:83 +0x166
pod : metadata-grpc-deployment-577c67c96f-872lf
status : crashloopbackoff 0/1 or Error
describe :
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning BackOff 3m46s (x5136 over 18h) kubelet Back-off restarting failed container
log :
2021-06-22 02:03:14.130165: F ml_metadata/metadata_store/metadata_store_server_main.cc:219] Non-OK-status: status status: Internal: mysql_real_connect failed: errno: 2002, error: Can't connect to MySQL server on 'metadata-db' (115)MetadataStore cannot be created with the given connection config.
pod : metadata-writer-756dbdd478-dwwpc
status : crashloopbackoff 1/2
describe :
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning BackOff 6s (x3807 over 18h) kubelet Back-off restarting failed container
log :
error: a container name must be specified for pod metadata-writer-756dbdd478-dwwpc, choose one of: [main istio-proxy] or one of the init containers: [istio-init]
pod : ml-pipeline-7c56db5db9-856mr
status : crashloopbackoff 1/2
describe :
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Pulled 56m (x324 over 18h) kubelet Container image "gcr.io/ml-pipeline/api-server:1.0.4" already present on machine
Warning BackOff 6m1s (x4072 over 18h) kubelet Back-off restarting failed container
Warning Unhealthy 67s (x2939 over 18h) kubelet Readiness probe failed:
log :
error: a container name must be specified for pod ml-pipeline-7c56db5db9-856mr, choose one of: [ml-pipeline-api-server istio-proxy] or one of the init containers: [istio-init]
I found out there was a problem with dynamic volume provisioning. But I can't solve this. I tried to configure nfs server and client, but it doesn't work.