I have deployed a build of mlflow to a pod in my kubernetes cluster. I'm able to port forward to the mlflow ui, and now I'm attempting to test it. To do this, I am running the following test on a jupyter notebook that is running on another pod in the same cluster.
import mlflow
print("Setting Tracking Server")
tracking_uri = "http://mlflow-tracking-server.default.svc.cluster.local:5000"
mlflow.set_tracking_uri(tracking_uri)
print("Logging Artifact")
mlflow.log_artifact('/home/test/mlflow-example-artifact.png')
print("DONE")
When I run this though, I get
ConnectionError: HTTPConnectionPool(host='mlflow-tracking-server.default.svc.cluster.local', port=5000): Max retries exceeded with url: /api/2.0/mlflow/runs/get? (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object>: Failed to establish a new connection: [Errno 111] Connection refused'))
The way I have deployed the mlflow pod is shown below in the yaml and docker:
Yaml:
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: mlflow-tracking-server
namespace: default
spec:
selector:
matchLabels:
app: mlflow-tracking-server
replicas: 1
template:
metadata:
labels:
app: mlflow-tracking-server
spec:
containers:
- name: mlflow-tracking-server
image: <ECR_IMAGE>
ports:
- containerPort: 5000
env:
- name: AWS_MLFLOW_BUCKET
value: <S3_BUCKET>
- name: AWS_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
name: aws-secret
key: AWS_ACCESS_KEY_ID
- name: AWS_SECRET_ACCESS_KEY
valueFrom:
secretKeyRef:
name: aws-secret
key: AWS_SECRET_ACCESS_KEY
---
apiVersion: v1
kind: Service
metadata:
name: mlflow-tracking-server
namespace: default
labels:
app: mlflow-tracking-server
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: nlb
spec:
externalTrafficPolicy: Local
type: LoadBalancer
selector:
app: mlflow-tracking-server
ports:
- name: http
port: 5000
targetPort: http
While the dockerfile calls a script that executes the mlflow server command: mlflow server --default-artifact-root ${AWS_MLFLOW_BUCKET} --host 0.0.0.0 --port 5000
The issue is though that I cannot connect to the service I have created using that mlflow pod. I have tried using the tracking uri http://mlflow-tracking-server.default.svc.cluster.local:5000
, I've tried using the service EXTERNAL-IP:5000, but everything I tried cannot connect and log using the service. Is there anything that I have missed in deploying my mlflow server pod to my kubernetes cluster?
Your mlflow-tracking-server service should have ClusterIP type, not LoadBalancer.
Both pods are inside the same Kubernetes cluster, therefore, there is no reason to use LoadBalancer Service type.
For some parts of your application (for example, frontends) you may want to expose a Service onto an external IP address, that’s outside of your cluster. Kubernetes ServiceTypes allow you to specify what kind of Service you want. The default is ClusterIP.
Type values and their behaviors are:
ClusterIP: Exposes the Service on a cluster-internal IP. Choosing this value makes the Service only reachable from within the cluster. This is the default ServiceType.
NodePort: Exposes the Service on each Node’s IP at a static port (the NodePort). A > ClusterIP Service, to which the NodePort Service routes, is automatically created. You’ll > be able to contact the NodePort Service, from outside the cluster, by requesting :.
LoadBalancer: Exposes the Service externally using a cloud provider’s load balancer. NodePort and ClusterIP Services, to which the external load balancer routes, are automatically created.
- ExternalName: Maps the Service to the contents of the externalName field (e.g. foo.bar.example.com), by returning a CNAME record with its value. No proxying of any kind is set up.