Flask + Apache Spark deploy on Kubernetes

4/20/2021

I’m trying to deploy a Flask application using Apache Spark 3.1.1 on Kubernetes.

app.py

from flask import Flask
from pyspark.sql import SparkSession
app = Flask(__name__)
app.debug = True

@app.route('/')
def main():
    print("Start of Code")
    spark = SparkSession.builder.appName("Test").getOrCreate()
    sc=spark.sparkContext
    spark.stop()
    print("End of Code")
    return 'hi'

if __name__ == '__main__':
    app.run()

requirements.txt

flask
pyspark

Dockerfile

  • NOTE: "spark-py" is the vanilla Spark image, obtainable by running "./bin/docker-image-tool.sh -p ./kubernetes/dockerfiles/spark/bindings/python/Dockerfile build" in "$SPARK_HOME" directory.

  • NOTE: I saved the result of this Dockerfile in local registry as "localhost:5000/k8tsspark".

     FROM spark-py 
     USER root
     COPY . /app
     RUN pip install -r /app/requirements.txt
     EXPOSE 5000
    

hello-flask.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: hello-flask
  name: hello-flask
spec:
  selector:
    matchLabels:
      app: hello-flask
  replicas: 1
  template:
    metadata:
      labels:
        app: hello-flask
    spec:
      containers:
      - name: hello-flask
        image: localhost:5000/k8tsspark:latest
        command: [
          "/bin/sh",
          "-c",
          "/opt/spark/bin/spark-submit \
          --master k8s://https://192.168.49.2:8443 \
          --deploy-mode cluster \
          --name spark-on-kubernetes \
          --conf spark.executor.instances=2 \
          --conf spark.executor.memory=1G \
          --conf spark.executor.cores=1 \
          --conf spark.kubernetes.container.image=localhost:5000/k8tsspark:latest \
          --conf spark.kubernetes.container.image.pullPolicy=Never \
          --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
          --conf spark.kubernetes.pyspark.pythonVersion=3 \
          --conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem \
          --conf spark.dynamicAllocation.enabled=false \
          local:///app/app.py"
        ]
        imagePullPolicy: Never
        ports:
        - containerPort: 5000
---
apiVersion: v1
kind: Service
metadata:
  name: hello-flask
  labels:
    app: hello-flask
spec:
  type: LoadBalancer
  ports:
  - name: http
    port: 5000
    protocol: TCP
    targetPort: 5000
  selector:
    app: hello-flask
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: spark
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: spark-role
subjects:
  - kind: ServiceAccount
    name: spark
    namespace: default
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: edit

terminal - kubectl apply

kubectl apply -f ./hello-flask.yaml

PROBLEM: using the dashboard I can see executor pods being created while booting
(the idea is to keep spark-driver always active and trigger spark-executors via API call)

kubectl get pods
    NAME                                          READY   STATUS    RESTARTS   AGE
    hello-flask-86689bdf84-ckkj4                  1/1     Running   0          5m33s
    spark-on-kubernetes-811fd878ef3d3c16-driver   1/1     Running   0          5m31s

kubectl get svc
    NAME                                              TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE
    hello-flask                                       LoadBalancer   10.103.254.34   <pending>     5000:32124/TCP               6m1s
    kubernetes                                        ClusterIP      10.96.0.1       <none>        443/TCP                      6m13s
    spark-on-kubernetes-811fd878ef3d3c16-driver-svc   ClusterIP      None            <none>        7078/TCP,7079/TCP,4040/TCP   5m59s

terminal - kubectl service

minikube service hello-flask
    |-----------|-------------|-------------|---------------------------|
    | NAMESPACE |    NAME     | TARGET PORT |            URL            |
    |-----------|-------------|-------------|---------------------------|
    | default   | hello-flask | http/5000   | http://192.168.49.2:32124 |
    |-----------|-------------|-------------|---------------------------|
    🎉  Opening service default/hello-flask in default browser...

sudo -E kubefwd svc


ERROR while opening hello-flask:5000 via browser:
    "The connection was reset"
Consequent ERROR in kubefwd:
    "ERRO[14:34:43] Runtime: an error occurred forwarding 5000 -> 5000: error forwarding port 5000 to pod bfa5f111e9f32f04a554975046539962734e4cf3fb05690d71697cedc49715a9, uid : exit status 1: 2021/04/20 12:34:43 socat[80737] E connect(5, AF=2 127.0.0.1:5000, 16): Connection refused"

I'm new to Kubernetes so I'm not sure this architecture is correct. Thanks!

-- SkuPak
apache-spark
docker
flask
kubernetes
python

0 Answers