Kubernetes 1.4.3 spin up script stays in loop forever on AWS

10/19/2016

When I run cluster/kube-up.sh it loops on Waiting for cluster initialization. I was tried to spin up a cluster in us-west-1, us-west-2, eu-west-1 several times and no success.

There is output from a startup script:

$ export KUBE_AWS_ZONE=eu-west-1a
$ export NUM_NODES=3
$ export KUBE_AWS_INSTANCE_PREFIX=test
$ export MASTER_SIZE=m3.medium
$ export NODE_SIZE=t2.medium
$ export KUBERNETES_PROVIDER=aws
$ ./cluster/kube-up.sh
... Starting cluster in eu-west-1a using provider aws
... calling verify-prereqs
... calling kube-up
Starting cluster using os distro: jessie
Uploading to Amazon S3
Creating kubernetes-staging-17d502113db4ff6c4fb9c4b42955c586
make_bucket: s3://kubernetes-staging-17d502113db4ff6c4fb9c4b42955c586/
Confirming bucket was created...
+++ Staging server tars to S3 Storage: kubernetes-staging-17d502113db4ff6c4fb9c4b42955c586/devel
upload: ../../tmp/kubernetes.Bj5OaA/s3/bootstrap-script to s3://kubernetes-staging-17d502113db4ff6c4fb9c4b42955c586/devel/bootstrap-script
upload: ../../tmp/kubernetes.Bj5OaA/s3/kubernetes-salt.tar.gz to s3://kubernetes-staging-17d502113db4ff6c4fb9c4b42955c586/devel/kubernetes-salt.tar.gz
upload: ../../tmp/kubernetes.Bj5OaA/s3/kubernetes-server-linux-amd64.tar.gz to s3://kubernetes-staging-17d502113db4ff6c4fb9c4b42955c586/devel/kubernetes-server-linux-amd64.tar.gz

Uploaded server tars:
  SERVER_BINARY_TAR_URL: https://s3.amazonaws.com/kubernetes-staging-17d502113db4ff6c4fb9c4b42955c586/devel/kubernetes-server-linux-amd64.tar.gz
  SALT_TAR_URL: https://s3.amazonaws.com/kubernetes-staging-17d502113db4ff6c4fb9c4b42955c586/devel/kubernetes-salt.tar.gz
  BOOTSTRAP_SCRIPT_URL: https://s3.amazonaws.com/kubernetes-staging-17d502113db4ff6c4fb9c4b42955c586/devel/bootstrap-script
INSTANCEPROFILE arn:aws:iam::333659885792:instance-profile/kubernetes-master    2016-07-28T10:52:30Z    AIPAJ2ESPVLTF7USVISQI   kubernetes-master   /
ROLES   arn:aws:iam::333659885792:role/kubernetes-master    2016-07-28T10:52:29Z    /   AROAJGF5WAAV5OFWYKBHW   kubernetes-master
ASSUMEROLEPOLICYDOCUMENT    2012-10-17
STATEMENT   sts:AssumeRole  Allow
PRINCIPAL   ec2.amazonaws.com
INSTANCEPROFILE arn:aws:iam::333659885792:instance-profile/kubernetes-minion    2016-08-04T08:41:10Z    AIPAJYJLZTINGNI4RFBLY   kubernetes-minion   /
ROLES   arn:aws:iam::333659885792:role/kubernetes-minion    2016-08-04T08:41:10Z    /   AROAIWUBQVYHYHTSSEH6C   kubernetes-minion
ASSUMEROLEPOLICYDOCUMENT    2012-10-17
STATEMENT   sts:AssumeRole  Allow
PRINCIPAL   ec2.amazonaws.com
Generating public/private rsa key pair.
Your identification has been saved in /root/.ssh/kube_aws_rsa.
Your public key has been saved in /root/.ssh/kube_aws_rsa.pub.
The key fingerprint is:
25:a6:8e:4f:79:2f:75:bf:55:6e:68:c6:7a:35:28:a9 root@ip-172-31-12-179
The key's randomart image is:
+---[RSA 2048]----+
|                 |
|                 |
|        o .      |
|       o o       |
|      . S   . . .|
|     o .  .o.o +o|
|    . + ......=.=|
|     o ..E   +oo |
|      .  .. .... |
+-----------------+
Using SSH key with (AWS) fingerprint: 25:a6:8e:4f:79:2f:75:bf:55:6e:68:c6:7a:35:28:a9
Using VPC vpc-55137031
Adding tag to dopt-c7f311a3: Name=kubernetes-dhcp-option-set
Adding tag to dopt-c7f311a3: KubernetesCluster=test
Using DHCP option set dopt-c7f311a3
Using existing subnet with CIDR 172.20.0.0/24
Using subnet subnet-0a43046e
Creating Internet Gateway.
Using Internet Gateway igw-9dc42bf9
Associating route table.
Creating route table
Adding tag to rtb-da9cb4be: KubernetesCluster=test
Associating route table rtb-da9cb4be to subnet subnet-0a43046e
Adding route to route table rtb-da9cb4be
Using Route Table rtb-da9cb4be
Creating master security group.
Creating security group kubernetes-master-test.
Adding tag to sg-80b072e6: KubernetesCluster=test
Creating minion security group.
Creating security group kubernetes-minion-test.
Adding tag to sg-8cb072ea: KubernetesCluster=test
Using master security group: kubernetes-master-test sg-80b072e6
Using minion security group: kubernetes-minion-test sg-8cb072ea
Creating master disk: size 20GB, type gp2
Adding tag to vol-1f19bb9d: Name=test-master-pd
Adding tag to vol-1f19bb9d: KubernetesCluster=test
Allocated Elastic IP for master: 52.49.10.199
Adding tag to vol-1f19bb9d: kubernetes.io/master-ip=52.49.10.199
Generating certs for alternate-names: IP:52.49.10.199,IP:172.20.0.9,IP:10.0.0.1,DNS:kubernetes,DNS:kubernetes.default,DNS:kubernetes.default.svc,DNS:kubernetes.default.svc.cluster.local,DNS:test-master
Starting Master
Adding tag to i-ac1ca727: Name=test-master
Adding tag to i-ac1ca727: Role=test-master
Adding tag to i-ac1ca727: KubernetesCluster=test
Waiting for master to be ready
Attempt 1 to check for master nodeWaiting for instance i-ac1ca727 to be running (currently pending)
Sleeping for 3 seconds...
Waiting for instance i-ac1ca727 to be running (currently pending)
Sleeping for 3 seconds...
Waiting for instance i-ac1ca727 to be running (currently pending)
Sleeping for 3 seconds...
 [master running]
Attaching IP 52.49.10.199 to instance i-ac1ca727
Attaching persistent data volume (vol-1f19bb9d) to master
2016-10-19T09:41:18.422Z    /dev/sdb    i-ac1ca727  attaching   vol-1f19bb9d
cluster "aws_test" set.
user "aws_test" set.
context "aws_test" set.
switched to context "aws_test".
user "aws_test-basic-auth" set.
Wrote config for aws_test to /root/.kube/config
Creating minion configuration
Creating autoscaling group
 0 minions started; waiting
 0 minions started; waiting
 0 minions started; waiting
 0 minions started; waiting
 3 minions started; ready
Waiting for cluster initialization.

  This will continually check to see if the API for kubernetes is reachable.
  This might loop forever if there was some uncaught error during start
  up.

..............................................................................................................................................................................................^C


$ ./cluster/kubectl.sh version
Client Version: version.Info{Major:"1", Minor:"4", GitVersion:"v1.4.3", GitCommit:"4957b090e9a4f6a68b4a40375408fdc74a212260", GitTreeState:"clean", BuildDate:"2016-10-16T06:36:33Z", GoVersion:"go1.6.3", Compiler:"gc", Platform:"linux/amd64"}

Some info from master node:

$ ps -ef | grep kube
root       620   618  0 10:00 pts/0    00:00:00 grep kube
$ cat /var/log/kube-apiserver.log
cat: /var/log/kube-apiserver.log: No such file or directory
$ cat /var/log/cloud-init.log | grep -i error
$ 
-- Andrew Smirnykh
amazon-ec2
amazon-web-services
kubectl
kubernetes

1 Answer

10/24/2016

I had similar problem, it was related to DNS server IP improperly allocated - in my case saltstack (framework used to provision master services) broke upon resolving local hostnames during master spin-up.

Your startup parameters seem to be rather inoccuous, so I don't really have any particular ideas about your case. But maybe checking the same logs for some similar error messages would help - so that at least you could file a faster-fixable issue for the k8s team.

I suggest to double check /var/log/syslog for something similar to the below and if you find such sign of saltstack provisioning issue(s) then go backward timewise from that point and search for something looking abnormal to you.

Sep 27 13:03:36 ip-172-40-0-9 rc.local[374]: -------------
Sep 27 13:03:36 ip-172-40-0-9 rc.local[374]: Succeeded: 89
Sep 27 13:03:36 ip-172-40-0-9 rc.local[374]: Failed:     6
Sep 27 13:03:36 ip-172-40-0-9 rc.local[374]: -------------
Sep 27 13:03:36 ip-172-40-0-9 rc.local[374]: Total:     95

Here is fully detailed issue I filed: https://github.com/kubernetes/kubernetes/issues/33559 - maybe some further details would help (not likely though).

-- Anton K
Source: StackOverflow