I am integrating Prometheus into my Kubernetes cluster with the helm chart I downloaded from https://github.com/helm/helm. I am using Azure to deploy my AKS if you must know. In each of my pod, the container runs a Docker image which includes the master_server.py
script that controls the workflow in my master pod.
I am trying to get some custom metrics off from my master pod via master_server.py
with the official Prometheus Python package - https://github.com/prometheus/client_python. My master_server.py
looks something like this,
master_server.py
(truncated)
import tornado.ioloop
import tornado.options
import tornado.web
import tornado.websocket
import tornado.gen
import tornado.concurrent
import prometheus_client as prom
num_req = prom.Counter('number_of_request_receive_by_master',
'number of request receive by master')
num_worker = prom.Gauge('number_of_worker_available',
'number of worker available')
def main():
logging.debug('Starting up server')
.
.
.
if __name__ == "__main__":
main()
prom.start_http_server(8081)
I googled a lil and found out that I need to add the annotations to allow Prometheus to scrape the data off my master pod. So in my deployment.yaml
file, I added the following snippet to allow Prometheus to scrape data off my master pod.
template:
metadata:
annotations:
prometheus.io/scrape: 'true'
prometheus.io/port: '8081'
Still, it didn't work. I cannot see my custom metrics in the Prometheus queries.
The following is my deployment.yaml
of the master pod.
Name: kaldi-feature-test-master
Namespace: kaldi-test
CreationTimestamp: Fri, 10 Jan 2020 01:53:09 +0800
Labels: app.kubernetes.io/instance=kaldi-feature-test
app.kubernetes.io/managed-by=Tiller
app.kubernetes.io/name=kaldi-feature-test-master
helm.sh/chart=kaldi-feature-test-0.1.0
Annotations: deployment.kubernetes.io/revision: 1
Selector: app.kubernetes.io/instance=kaldi-feature-test,app.kubernetes.io/name=kaldi-feature-test-master
Replicas: 2 desired | 2 updated | 2 total | 2 available | 0 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 25% max unavailable, 25% max surge
Pod Template:
Labels: app.kubernetes.io/instance=kaldi-feature-test
app.kubernetes.io/name=kaldi-feature-test-master
Annotations: prometheus.io/port: 8081
prometheus.io/scrape: true
Containers:
kaldi-feature-test-master:
Image: kalditest.azurecr.io/kalditestscaled:latest
Port: 8080/TCP
Host Port: 0/TCP
Command:
/home/appuser/opt/tini
--
/home/appuser/opt/start_master.sh
Limits:
cpu: 2
memory: 2Gi
Requests:
cpu: 2
memory: 2Gi
Liveness: http-get http://:http/ delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:http/ delay=0s timeout=1s period=10s #success=1 #failure=3
Environment Variables from:
environment-variables-master-secret Secret Optional: false
Environment: <none>
Mounts: <none>
Volumes: <none>
Conditions:
Type Status Reason
---- ------ ------
Available True MinimumReplicasAvailable
Progressing True NewReplicaSetAvailable
OldReplicaSets: <none>
NewReplicaSet: kaldi-feature-test-master-79886c5d76 (2/2 replicas created)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingReplicaSet 15m deployment-controller Scaled up replica set kaldi-feature-test-master-79886c5d76 to 2
I checked the Prometheus targets and realised that the connection is refused to my master pods.
What should I do to let Prometheus scrape the custom metrics from my master pod?
From the Python code and the deployment YAML file that you provided as can be seen, the HTTP server listens to the port 8081, but you only exposed the port 8080, not include the port 8081.
So the solution is that you need to expose the port 8081 both in your container kaldi-feature-test-master
of the deployment and the service which routes requests to your application of the deployment.
Yes I got it working thanks to Charles' comments!
I was running a Tornado web server for my application in the master pod at port 8080 so that might have disrupted the Prometheus HTTP server to scrape the metrics out of the master pod.
In the end, I opened another port at 8081 in my master pod's deployment.yaml
like this,
.
.
.
containers:
- name: master-pod-name
image: master-pod-image
ports:
- name: http
containerPort: 8080 # this is for my Tornado web server
protocol: TCP
- name: prometheus
containerPort: 8081
.
.
.
Then in my python script running in the master pod, I set the Prometheus server to run at port 8081. Finally it worked - prom.start_http_server(8081)