Kubernetes MLflow Service Pod Connection

4/21/2020

I have deployed a build of mlflow to a pod in my kubernetes cluster. I'm able to port forward to the mlflow ui, and now I'm attempting to test it. To do this, I am running the following test on a jupyter notebook that is running on another pod in the same cluster.

import mlflow

print("Setting Tracking Server")
tracking_uri = "http://mlflow-tracking-server.default.svc.cluster.local:5000"

mlflow.set_tracking_uri(tracking_uri)

print("Logging Artifact")
mlflow.log_artifact('/home/test/mlflow-example-artifact.png')

print("DONE")

When I run this though, I get

ConnectionError: HTTPConnectionPool(host='mlflow-tracking-server.default.svc.cluster.local', port=5000): Max retries exceeded with url: /api/2.0/mlflow/runs/get? (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object>: Failed to establish a new connection: [Errno 111] Connection refused'))

The way I have deployed the mlflow pod is shown below in the yaml and docker:

Yaml:

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: mlflow-tracking-server
  namespace: default
spec:
  selector:
    matchLabels:
      app: mlflow-tracking-server
  replicas: 1
  template:
    metadata:
      labels:
        app: mlflow-tracking-server
    spec:
      containers:
      - name: mlflow-tracking-server
        image: <ECR_IMAGE>
        ports:
        - containerPort: 5000
        env:
        - name: AWS_MLFLOW_BUCKET
          value: <S3_BUCKET>
        - name: AWS_ACCESS_KEY_ID
          valueFrom:
            secretKeyRef:
              name: aws-secret
              key: AWS_ACCESS_KEY_ID
        - name: AWS_SECRET_ACCESS_KEY
          valueFrom:
            secretKeyRef:
              name: aws-secret
              key: AWS_SECRET_ACCESS_KEY

---
apiVersion: v1
kind: Service
metadata:
  name: mlflow-tracking-server
  namespace: default
  labels:
    app: mlflow-tracking-server
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: nlb
spec:
  externalTrafficPolicy: Local
  type: LoadBalancer
  selector:
    app: mlflow-tracking-server
  ports:
    - name: http
      port: 5000
      targetPort: http

While the dockerfile calls a script that executes the mlflow server command: mlflow server --default-artifact-root ${AWS_MLFLOW_BUCKET} --host 0.0.0.0 --port 5000

The issue is though that I cannot connect to the service I have created using that mlflow pod. I have tried using the tracking uri http://mlflow-tracking-server.default.svc.cluster.local:5000, I've tried using the service EXTERNAL-IP:5000, but everything I tried cannot connect and log using the service. Is there anything that I have missed in deploying my mlflow server pod to my kubernetes cluster?

-- JMV12
kubernetes
kubernetes-service
mlflow

1 Answer

4/21/2020

Your mlflow-tracking-server service should have ClusterIP type, not LoadBalancer.

Both pods are inside the same Kubernetes cluster, therefore, there is no reason to use LoadBalancer Service type.

For some parts of your application (for example, frontends) you may want to expose a Service onto an external IP address, that’s outside of your cluster. Kubernetes ServiceTypes allow you to specify what kind of Service you want. The default is ClusterIP.

Type values and their behaviors are:

  • ClusterIP: Exposes the Service on a cluster-internal IP. Choosing this value makes the Service only reachable from within the cluster. This is the default ServiceType.

  • NodePort: Exposes the Service on each Node’s IP at a static port (the NodePort). A > ClusterIP Service, to which the NodePort Service routes, is automatically created. You’ll > be able to contact the NodePort Service, from outside the cluster, by requesting :.

  • LoadBalancer: Exposes the Service externally using a cloud provider’s load balancer. NodePort and ClusterIP Services, to which the external load balancer routes, are automatically created.

  • ExternalName: Maps the Service to the contents of the externalName field (e.g. foo.bar.example.com), by returning a CNAME record with its value. No proxying of any kind is set up.

kubernetes.io

-- Paul Brit
Source: StackOverflow