IllegalArgumentException at SparkContext running on K8s

5/12/2020

I wanna access spark master running on k8s cluster from an external machine via bellow python code snippet:

from pyspark import SparkConf, SparkContext
conf = SparkConf().setAppName('hello').setMaster("spark://sparkk-p-wxqpn-master-0.sparkk-p-wxqpn-headless.sparkk-p-wxqpn.svc.cluster.local:7077")
sc = SparkContext(conf=conf)

but I consistently getting IllegalArgumentException error.

---------------------------------------------------------------------------
Py4JJavaError                             Traceback (most recent call last)
<ipython-input-1-bdb382eb8026> in <module>
      1 from pyspark import SparkConf, SparkContext
      2 conf = SparkConf().setAppName('hello').setMaster("spark://spark2-p-27b57-master-0.spark2-p-27b57-headless.spark2-p-27b57.svc.cluster.local:7077")

----> 3 sc = SparkContext(conf=conf)

~/anaconda3/envs/py36/lib/python3.6/site-packages/pyspark/context.py in __init__(self, master, appName, sparkHome, pyFiles, environment, batchSize, serializer, conf, gateway, jsc, profiler_cls)
    134         try:
    135             self._do_init(master, appName, sparkHome, pyFiles, environment, batchSize, serializer,
--> 136                           conf, jsc, profiler_cls)
    137         except:
    138             # If an error occurs, clean up in order to allow future SparkContext creation:

~/anaconda3/envs/py36/lib/python3.6/site-packages/pyspark/context.py in _do_init(self, master, appName, sparkHome, pyFiles, environment, batchSize, serializer, conf, jsc, profiler_cls)
    196 
    197         # Create the Java SparkContext through Py4J
--> 198         self._jsc = jsc or self._initialize_context(self._conf._jconf)
    199         # Reset the SparkConf to the one actually used by the SparkContext in JVM.
    200         self._conf = SparkConf(_jconf=self._jsc.sc().conf())

~/anaconda3/envs/py36/lib/python3.6/site-packages/pyspark/context.py in _initialize_context(self, jconf)
    304         Initialize SparkContext in function to allow subclass specific initialization
    305         """
--> 306         return self._jvm.JavaSparkContext(jconf)
    307 
    308     @classmethod

~/anaconda3/envs/py36/lib/python3.6/site-packages/py4j/java_gateway.py in __call__(self, *args)
   1523         answer = self._gateway_client.send_command(command)
   1524         return_value = get_return_value(
-> 1525             answer, self._gateway_client, None, self._fqn)
   1526 
   1527         for temp_arg in temp_args:

~/anaconda3/envs/py36/lib/python3.6/site-packages/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
    326                 raise Py4JJavaError(
    327                     "An error occurred while calling {0}{1}{2}.\n".
--> 328                     format(target_id, ".", name), value)
    329             else:
    330                 raise Py4JError(

Py4JJavaError: An error occurred while calling None.org.apache.spark.api.
java.JavaSparkContext.
: java.lang.IllegalArgumentException: requirement failed: Can only call getServletHandlers on a running MetricsSystem
    at scala.Predef$.require(Predef.scala:224)
    at org.apache.spark.metrics.MetricsSystem.getServletHandlers(MetricsSystem.scala:91)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:516)
    at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:238)
    at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
    at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.lang.Thread.run(Thread.java:748)

Additional information
kubectl get services output:

NAME                                               TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)           AGE
service/ingress-5ae9f1971f6d110fa5aa8a1705854b32   ClusterIP   10.43.4.52      <none>        7077/TCP          17m
service/ingress-95b34b44fc89ed464cb0421dec8be6c4   ClusterIP   10.43.66.178    <none>        8080/TCP          16m
service/sparkk-p-wxqpn-headless                    ClusterIP   None            <none>        <none>            22m
service/sparkk-p-wxqpn-master-svc                  ClusterIP   10.43.192.111   <none>        7077/TCP,80/TCP   22m

I launched Apache Spark on Rancher installed on 3 bare metal servers and then exposed port 8080 (spark-web) and port 7077 (spark-master) for k8s workloads using Ingress.

Spark web UI works perfectly but I cannot access spark-master on port 7077 via pyspark on external machine.

Any help will be appreciated. Thanks.

-- BobyCloud
apache-spark
kubernetes
pyspark
python
rancher

0 Answers