I’m trying to deploy a Flask application using Apache Spark 3.1.1 on Kubernetes.
app.py
from flask import Flask
from pyspark.sql import SparkSession
app = Flask(__name__)
app.debug = True
@app.route('/')
def main():
print("Start of Code")
spark = SparkSession.builder.appName("Test").getOrCreate()
sc=spark.sparkContext
spark.stop()
print("End of Code")
return 'hi'
if __name__ == '__main__':
app.run()
requirements.txt
flask
pyspark
Dockerfile
NOTE: "spark-py" is the vanilla Spark image, obtainable by running "./bin/docker-image-tool.sh -p ./kubernetes/dockerfiles/spark/bindings/python/Dockerfile build" in "$SPARK_HOME" directory.
NOTE: I saved the result of this Dockerfile in local registry as "localhost:5000/k8tsspark".
FROM spark-py
USER root
COPY . /app
RUN pip install -r /app/requirements.txt
EXPOSE 5000
hello-flask.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: hello-flask
name: hello-flask
spec:
selector:
matchLabels:
app: hello-flask
replicas: 1
template:
metadata:
labels:
app: hello-flask
spec:
containers:
- name: hello-flask
image: localhost:5000/k8tsspark:latest
command: [
"/bin/sh",
"-c",
"/opt/spark/bin/spark-submit \
--master k8s://https://192.168.49.2:8443 \
--deploy-mode cluster \
--name spark-on-kubernetes \
--conf spark.executor.instances=2 \
--conf spark.executor.memory=1G \
--conf spark.executor.cores=1 \
--conf spark.kubernetes.container.image=localhost:5000/k8tsspark:latest \
--conf spark.kubernetes.container.image.pullPolicy=Never \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.kubernetes.pyspark.pythonVersion=3 \
--conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem \
--conf spark.dynamicAllocation.enabled=false \
local:///app/app.py"
]
imagePullPolicy: Never
ports:
- containerPort: 5000
---
apiVersion: v1
kind: Service
metadata:
name: hello-flask
labels:
app: hello-flask
spec:
type: LoadBalancer
ports:
- name: http
port: 5000
protocol: TCP
targetPort: 5000
selector:
app: hello-flask
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: spark
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: spark-role
subjects:
- kind: ServiceAccount
name: spark
namespace: default
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: edit
terminal - kubectl apply
kubectl apply -f ./hello-flask.yaml
PROBLEM: using the dashboard I can see executor pods being created while booting
(the idea is to keep spark-driver always active and trigger spark-executors via API call)
kubectl get pods
NAME READY STATUS RESTARTS AGE
hello-flask-86689bdf84-ckkj4 1/1 Running 0 5m33s
spark-on-kubernetes-811fd878ef3d3c16-driver 1/1 Running 0 5m31s
kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
hello-flask LoadBalancer 10.103.254.34 <pending> 5000:32124/TCP 6m1s
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 6m13s
spark-on-kubernetes-811fd878ef3d3c16-driver-svc ClusterIP None <none> 7078/TCP,7079/TCP,4040/TCP 5m59s
terminal - kubectl service
minikube service hello-flask
|-----------|-------------|-------------|---------------------------|
| NAMESPACE | NAME | TARGET PORT | URL |
|-----------|-------------|-------------|---------------------------|
| default | hello-flask | http/5000 | http://192.168.49.2:32124 |
|-----------|-------------|-------------|---------------------------|
🎉 Opening service default/hello-flask in default browser...
sudo -E kubefwd svc
ERROR while opening hello-flask:5000 via browser:
"The connection was reset"
Consequent ERROR in kubefwd:
"ERRO[14:34:43] Runtime: an error occurred forwarding 5000 -> 5000: error forwarding port 5000 to pod bfa5f111e9f32f04a554975046539962734e4cf3fb05690d71697cedc49715a9, uid : exit status 1: 2021/04/20 12:34:43 socat[80737] E connect(5, AF=2 127.0.0.1:5000, 16): Connection refused"
I'm new to Kubernetes so I'm not sure this architecture is correct. Thanks!