I got gRPC service in GKE with 3 pods. Each pod got attached ESP container with rollout_strategy=managed and backend pointing to gRPC.
Sometimes when I hit http endpoint I got
{
"code": 13,
"message": "Failed to fetch service account token",
"details": [
{
"@type": "type.googleapis.com/google.rpc.DebugInfo",
"stackEntries": [],
"detail": "internal"
}
]
}
Every third request failed. When I deleted every pod, it started to works. I assume it will fail after one hour when the token will be expired.
Deployment ESP container
containers:
- args:
- --http_port=80
- --backend=grpc://127.0.0.1:6000
- --service=service.endpoints.cluster.cloud.goog
- --rollout_strategy=managed
image: gcr.io/endpoints-release/endpoints-runtime:1
imagePullPolicy: IfNotPresent
name: esp
ports:
- containerPort: 80
protocol: TCP
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
In container logs there are errors like this:
2019/07/17 07:51:05[error]10#10: Failed to call https://servicecontrol.googleapis.com/v1/services/service.endpoints.cluster.cloud.goog:report, Error: UNAUTHORIZED: server response status code: 401, Response body: �Request had invalid authentication credentials. Expected OAuth 2 access token, login cookie or other valid authentication credential. See https://developers.google.com/identity/sign-in/web/devconsole-project.
[libprotobuf ERROR external/servicecontrol_client_git/src/service_control_client_impl.cc:182] Failed in Report call: Service control request failed with HTTP response code 401
I found the same issue here https://groups.google.com/forum/#!topic/google-cloud-endpoints/0EEY88X8oIY But it's just workaround by providing explicitly service account.
Edit: I tried service account, but there is still problem when pod is trying to refresh token
2019-07-19T11:45:48.031715293Z INFO:Refreshing access_token
E
2019-07-19T11:45:48.071914451Z Traceback (most recent call last):
File ".bootstrap/_pex/pex.py", line 365, in execute
File ".bootstrap/_pex/pex.py", line 293, in _wrap_coverage
File ".bootstrap/_pex/pex.py", line 325, in _wrap_profiling
File ".bootstrap/_pex/pex.py", line 408, in _execute
File ".bootstrap/_pex/pex.py", line 466, in execute_entry
File ".bootstrap/_pex/pex.py", line 471, in execute_module
File "/usr/lib/python2.7/runpy.py", line 192, in run_module
fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "start_esp/start_esp.py", line 1061, in <module>
File "start_esp/start_esp.py", line 343, in fetch_service_config
File "start_esp/fetch_service_config.py", line 151, in make_access_token
File "/root/.pex/install/oauth2client-4.1.3-py2.py3-none-any.whl.289a6efc12c558f2f0ae1841479ec3ee7a783c1b/oauth2client-4.1.3-py2.py3-none-any.whl/oauth2client/client.py", line 663, in get_access_token
E
2019-07-19T11:45:48.074420362Z self.refresh(http)
E
2019-07-19T11:45:48.074424955Z File "/root/.pex/install/oauth2client-4.1.3-py2.py3-none-any.whl.289a6efc12c558f2f0ae1841479ec3ee7a783c1b/oauth2client-4.1.3-py2.py3-none-any.whl/oauth2client/client.py", line 545, in refresh
E
2019-07-19T11:45:48.074430139Z self._refresh(http)
E
2019-07-19T11:45:48.074434982Z File "/root/.pex/install/oauth2client-4.1.3-py2.py3-none-any.whl.289a6efc12c558f2f0ae1841479ec3ee7a783c1b/oauth2client-4.1.3-py2.py3-none-any.whl/oauth2client/client.py", line 749, in _refresh
E
2019-07-19T11:45:48.074440214Z self._do_refresh_request(http)
E
2019-07-19T11:45:48.074444687Z File "/root/.pex/install/oauth2client-4.1.3-py2.py3-none-any.whl.289a6efc12c558f2f0ae1841479ec3ee7a783c1b/oauth2client-4.1.3-py2.py3-none-any.whl/oauth2client/client.py", line 780, in _do_refresh_request
E
2019-07-19T11:45:48.074450391Z body=body, headers=headers)
E
2019-07-19T11:45:48.074455280Z File "/root/.pex/install/oauth2client-4.1.3-py2.py3-none-any.whl.289a6efc12c558f2f0ae1841479ec3ee7a783c1b/oauth2client-4.1.3-py2.py3-none-any.whl/oauth2client/transport.py", line 282, in request
E
2019-07-19T11:45:48.074460859Z connection_type=connection_type)
E
2019-07-19T11:45:48.074465988Z File "/root/.pex/install/httplib2-0.13.0-py2-none-any.whl.f7b2ff9b447b5741fa19aab52032ab1c7db17f35/httplib2-0.13.0-py2-none-any.whl/httplib2/__init__.py", line 2135, in request
E
2019-07-19T11:45:48.074482470Z cachekey,
E
2019-07-19T11:45:48.074487590Z File "/root/.pex/install/httplib2-0.13.0-py2-none-any.whl.f7b2ff9b447b5741fa19aab52032ab1c7db17f35/httplib2-0.13.0-py2-none-any.whl/httplib2/__init__.py", line 1796, in _request
E
2019-07-19T11:45:48.074492698Z conn, request_uri, method, body, headers
E
2019-07-19T11:45:48.074497648Z File "/root/.pex/install/httplib2-0.13.0-py2-none-any.whl.f7b2ff9b447b5741fa19aab52032ab1c7db17f35/httplib2-0.13.0-py2-none-any.whl/httplib2/__init__.py", line 1737, in _conn_request
E
2019-07-19T11:45:48.074502516Z response = conn.getresponse()
E
2019-07-19T11:45:48.074506952Z File "/usr/lib/python2.7/httplib.py", line 1108, in getresponse
E
2019-07-19T11:45:48.074511855Z raise ResponseNotReady()
E
2019-07-19T11:45:48.074516567Z httplib.ResponseNotReady
E
Afterwards pod is restarted and everything works for 1 hour.
GCP has a Metadata server for ESP to get the access token. IF it is overloaded, ESP may fail since it needs the access token to talk to Google services. If this problem happens often, your should file a bug for GKE team.
Other users have run into similar situations with heavy traffic. I have suggested them the same workaround as https://groups.google.com/forum/#!topic/google-cloud-endpoints/0EEY88X8oIY