Janusgraph in kubernetes not able to connect to Cassandra running as another service

3/29/2019

I am trying to run Janusgraph with storage as Cassandra, running as another service in same cluster and Elasticsearch for indexing, again running as another service in same cluster.

While the required ports are open in both services, janusgraph pods' logs say its facing connection timeout while connecting to Cassandra.

23343 [main] WARN  org.apache.tinkerpop.gremlin.server.GremlinServer  - Graph [graph] configured at [conf/gremlin-server/janusgraph.properties] could not be instantiated and will not be available in Gremlin Server.  GraphFactory message: GraphFactory could not instantiate this Graph implementation [class org.janusgraph.core.JanusGraphFactory]
java.lang.RuntimeException: GraphFactory could not instantiate this Graph implementation [class org.janusgraph.core.JanusGraphFactory]
    at org.apache.tinkerpop.gremlin.structure.util.GraphFactory.open(GraphFactory.java:82)
    at org.apache.tinkerpop.gremlin.structure.util.GraphFactory.open(GraphFactory.java:70)
    at org.apache.tinkerpop.gremlin.structure.util.GraphFactory.open(GraphFactory.java:104)
    at org.apache.tinkerpop.gremlin.server.util.DefaultGraphManager.lambda$new$0(DefaultGraphManager.java:57)
    at java.util.LinkedHashMap$LinkedEntrySet.forEach(LinkedHashMap.java:671)
    at org.apache.tinkerpop.gremlin.server.util.DefaultGraphManager.<init>(DefaultGraphManager.java:55)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at org.apache.tinkerpop.gremlin.server.util.ServerGremlinExecutor.<init>(ServerGremlinExecutor.java:110)
    at org.apache.tinkerpop.gremlin.server.util.ServerGremlinExecutor.<init>(ServerGremlinExecutor.java:89)
    at org.apache.tinkerpop.gremlin.server.GremlinServer.<init>(GremlinServer.java:110)
    at org.apache.tinkerpop.gremlin.server.GremlinServer.main(GremlinServer.java:354)
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.tinkerpop.gremlin.structure.util.GraphFactory.open(GraphFactory.java:78)
    ... 13 more
Caused by: java.lang.IllegalArgumentException: Could not instantiate implementation: org.janusgraph.diskstorage.cassandra.astyanax.AstyanaxStoreManager
    at org.janusgraph.util.system.ConfigurationUtil.instantiate(ConfigurationUtil.java:69)
    at org.janusgraph.diskstorage.Backend.getImplementationClass(Backend.java:477)
    at org.janusgraph.diskstorage.Backend.getStorageManager(Backend.java:409)
    at org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration.<init>(GraphDatabaseConfiguration.java:1376)
    at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:164)
    at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:133)
    at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:113)
    ... 18 more
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at org.janusgraph.util.system.ConfigurationUtil.instantiate(ConfigurationUtil.java:58)
    ... 24 more
Caused by: org.janusgraph.diskstorage.TemporaryBackendException: Temporary failure in storage backend
    at org.janusgraph.diskstorage.cassandra.astyanax.AstyanaxStoreManager.ensureKeyspaceExists(AstyanaxStoreManager.java:619)
    at org.janusgraph.diskstorage.cassandra.astyanax.AstyanaxStoreManager.<init>(AstyanaxStoreManager.java:314)
    ... 29 more
Caused by: com.netflix.astyanax.connectionpool.exceptions.PoolTimeoutException: PoolTimeoutException: [host=cassandra(SERVICE_IP):9160, latency=10001(10001), attempts=1]Timed out waiting for connection
    at com.netflix.astyanax.connectionpool.impl.SimpleHostConnectionPool.waitForConnection(SimpleHostConnectionPool.java:231)
    at com.netflix.astyanax.connectionpool.impl.SimpleHostConnectionPool.borrowConnection(SimpleHostConnectionPool.java:198)
    at com.netflix.astyanax.connectionpool.impl.RoundRobinExecuteWithFailover.borrowConnection(RoundRobinExecuteWithFailover.java:84)
    at com.netflix.astyanax.connectionpool.impl.AbstractExecuteWithFailoverImpl.tryOperation(AbstractExecuteWithFailoverImpl.java:117)
    at com.netflix.astyanax.connectionpool.impl.AbstractHostPartitionConnectionPool.executeWithFailover(AbstractHostPartitionConnectionPool.java:352)
    at com.netflix.astyanax.thrift.ThriftClusterImpl.executeSchemaChangeOperation(ThriftClusterImpl.java:146)
    at com.netflix.astyanax.thrift.ThriftClusterImpl.internalCreateKeyspace(ThriftClusterImpl.java:321)
    at com.netflix.astyanax.thrift.ThriftClusterImpl.addKeyspace(ThriftClusterImpl.java:294)
    at org.janusgraph.diskstorage.cassandra.astyanax.AstyanaxStoreManager.ensureKeyspaceExists(AstyanaxStoreManager.java:614)

I am running janusgrah v2 image and gcr.io/google-samples/cassandra:v13 image for cassandra.

I tried connecting to cassandra port 9160 from busybox pod too. But doesn't seem to work. But the interesting thing is: ping seems to work for the service name (cassandra here). But only when it gets to telnet on port 9160 or 9042 i get connection refused error.

Here is cassandra STS:

apiVersion: v1
kind: Service
metadata:
  labels:
    app: cassandra
  name: cassandra
spec:
  clusterIP: None
  ports:
  - port: 9042
    name: cql
  - port: 9160
    name: thrift
  selector:
    app: cassandra
---    
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: cassandra
  labels:
    app: cassandra
spec:
  serviceName: cassandra
  replicas: 3
  selector:
    matchLabels:
      app: cassandra
  template:
    metadata:
      labels:
        app: cassandra
    spec:
      terminationGracePeriodSeconds: 1800
      #schedulerName: stork       #Check benefits of using STORK as scheduler.
      containers:
      - name: cassandra
        image: gcr.io/google-samples/cassandra:v13
        ports:
          - containerPort: 7000
            name: intra-node
          - containerPort: 7001
            name: tls-intra-node
          - containerPort: 7199
            name: jmx
          - containerPort: 9042
            name: cql
          - containerPort: 9160
            name: thrift
          - containerPort: 9142
            name: transportssl
        resources:
          limits:
            cpu: "1Gi"
            memory: 2Gi
          requests:
            cpu: "500m"
            memory: 1Gi
        securityContext:
          capabilities:
            add:
              - IPC_LOCK
        lifecycle:
          preStop:
            exec:
              command: 
              - /bin/sh
              - -c
              - nodetool drain
        env:
          - name: CASSANDRA_SEEDS
            value: cassandra-0.cassandra.default.svc.cluster.local
          - name: MAX_HEAP_SIZE 
            value: 512M
          - name: HEAP_NEWSIZE
            value: 512M
          - name: CASSANDRA_CLUSTER_NAME
            value: "Cassandra"
          - name: CASSANDRA_DC
            value: "DC1"
          - name: CASSANDRA_RACK
            value: "Rack1"
          - name: CASSANDRA_AUTO_BOOTSTRAP
            value: "false"            
          - name: CASSANDRA_ENDPOINT_SNITCH
            value: GossipingPropertyFileSnitch
          - name: CASSANDRA_RPC_ADDRESS
            value: 0.0.0.0
          - name: CASSANDRA_NUM_TOKENS
            value: "32"
          - name: POD_IP
            valueFrom:
              fieldRef:
                fieldPath: status.podIP
        readinessProbe:
          exec:
            command:
            - /bin/bash
            - -c
            - /ready-probe.sh
          initialDelaySeconds: 15
          timeoutSeconds: 5
        volumeMounts:
        - name: nfs-pvc-cassandra
          mountPath: /srv/nfs/kubedata/janus
      restartPolicy: Always
      volumes:
        - name: nfs-pvc-cassandra
          persistentVolumeClaim:
            claimName: nfs-pvc-cassandra

What could be the way i can debug this further?

-- Avik Aggarwal
connection-timeout
janusgraph
kubernetes
kubernetes-helm

3 Answers

4/18/2019

If your janusgraph is running on your host machine, you maybe have to do a port-forward of your kubernetes services ports to access it in local. Maybe you already did it

-- Amojow
Source: StackOverflow

2/27/2020

Just update Storage.hostname: cassandra-0.cassandra.default in janusgraph values.yaml for janusgraph pod to communicate with cassandra.

Check thrift status on cassandra node using nodetool command nodetool statusThrift

if not enable , then enable it again using nodetool command (nodetool enablethrift)

-- Mudassar Rana
Source: StackOverflow

5/20/2019

As I can confirm that your StatefulSet yaml works just fine, and headless service creates dns name directing to endpoints of pods. I created simple nginx pod to telnet to service for checking it. Here is the output:

Check if endpoints and service exist for cassandra

$ kubectl get svc,ep cassandra
NAME                TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)             AGE
service/cassandra   ClusterIP   None         <none>        9042/TCP,9160/TCP   2h

NAME                  ENDPOINTS                                                   AGE
endpoints/cassandra   10.56.0.10:9160,10.56.3.3:9160,10.56.4.2:9160 + 3 more...   2h

Exec to the neighbor pod in the same namespace and telnet to service and pods

$ kubectl  exec -it nginx-79dbd67896-9dwp8 bash

root@nginx-79dbd67896-9dwp8:/# telnet cassandra 9042
Trying 10.56.3.3...
Connected to cassandra.default.svc.cluster.local.
Escape character is '^]'.

telnet> quit
Connection closed.

root@nginx-79dbd67896-9dwp8:/# telnet 10.56.0.10 9042
Trying 10.56.0.10...
Connected to 10.56.0.10.
Escape character is '^]'.

As it seems from output service and pods are listening on port 9042, NOT 9160, because Port 9160 is for Cassandra's Thrift API server, which is disabled by default. For more about this issue check https://github.com/docker-library/cassandra/issues/127 . You have to check how to enable Thrift API port.

You can check listening ports on cassandra pods, by exec-ing into one of this pod and running below commands:

root@cassandra-0:/# apt update && apt install telnet net-tools
<output omitted>

root@cassandra-0:/# netstat -tulpen
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       User       Inode      PID/Program name    
tcp        0      0 0.0.0.0:9042            0.0.0.0:*               LISTEN      1000       10163244   -                   
tcp        0      0 127.0.0.1:43669         0.0.0.0:*               LISTEN      1000       10162974   -                   
tcp        0      0 10.56.0.10:7000         0.0.0.0:*               LISTEN      1000       10163145   -                   
tcp        0      0 127.0.0.1:7199          0.0.0.0:*               LISTEN      1000       10162973   -                   

Hope it helps!

-- coolinuxoid
Source: StackOverflow