Hello and thanks in advance for your help !
I am trying to deploy a Kubernetes cluster using Kubespray (an ansible playbook). I am trying to deploy the Cluster on 17 KVM host (all nodes are running centos 7 , and are hosted on a baremetal server)
However , when I try to run the playbook at the task [download : file_download | Download item]
I get the following error (and thus stopping the installation) :
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: SSLError: ('The read operation timed out',)
fatal: [node1]: FAILED! => {"attempts": 4, "changed": false, "msg": "failed to create temporary content file: ('The read operation timed out',)"}
This error is the same for most of the node , BUT , some nodes are willing to downlaod.
Here is the error in verbose mode:
The full traceback is:
Traceback (most recent call last):
File "/tmp/ansible_get_url_payload_72qREk/__main__.py", line 360, in url_get
shutil.copyfileobj(rsp, f)
File "/usr/lib64/python2.7/shutil.py", line 49, in copyfileobj
buf = fsrc.read(length)
File "/usr/lib64/python2.7/socket.py", line 380, in read
data = self._sock.recv(left)
File "/usr/lib64/python2.7/httplib.py", line 602, in read
s = self.fp.read(amt)
File "/usr/lib64/python2.7/socket.py", line 380, in read
data = self._sock.recv(left)
File "/usr/lib64/python2.7/ssl.py", line 757, in recv
return self.read(buflen)
File "/usr/lib64/python2.7/ssl.py", line 651, in read
v = self._sslobj.read(len or 1024)
SSLError: ('The read operation timed out',)
fatal: [node14]: FAILED! => {
"attempts": 4,
"changed": false,
"invocation": {
"module_args": {
"attributes": null,
"backup": null,
"checksum": "",
"client_cert": null,
"client_key": null,
"content": null,
"delimiter": null,
"dest": "/tmp/releases/kubeadm",
"directory_mode": null,
"follow": false,
"force": false,
"force_basic_auth": false,
"group": null,
"headers": null,
"http_agent": "ansible-httpget",
"mode": "0755",
"owner": "root",
"regexp": null,
"remote_src": null,
"selevel": null,
"serole": null,
"setype": null,
"seuser": null,
"sha256sum": "c4fc478572b5623857f5d820e1c107ae02049ca02cf2993e512a091a0196957b",
"src": null,
"timeout": 10,
"tmp_dest": null,
"unsafe_writes": null,
"url": "https://storage.googleapis.com/kubernetes-release/release/v1.14.1/bin/linux/amd64/kubeadm",
"url_password": null,
"url_username": null,
"use_proxy": true,
"validate_certs": true
}
},
"msg": "failed to create temporary content file: ('The read operation timed out',)"
}
I tried to connect to the nodes and try to download an item (I tried to download the Kubspray zip) and it worked , all nodes reach the internet and can install package.
From the verbose output , I somewhat understood that the error come from python , but I really don't know how to solve it...
Let me know if I can give you another piece of information and angain , thanks in advance !
So , I "solved" the problem.
In fact , Ansbible has a timeout of 10 seconds for all ssh related command/read/write task. For unknown reasons , the write task on my nodes was a bit slow , so I got this error. Fortunately , you can change the time before a timeout.
You can either use the argument -T (or-tiemout ) 'time_wanted' or change the ansible.cfg. The 'basic' cfg file is located in etc/ansible/ansible.etc
, but be careful , some custom ansible playbook will have a custom ansible.cfg so you will need to find it and change it to your needs.
I was lucky , the 4 time I ran the playbook , it installed K8s smoothly.
For testing reason , I am running my nodes on KVM , so maybe the write slowness come from here.
Hope that someone will find my answer useful !