I am trying to run Janusgraph with storage as Cassandra, running as another service in same cluster and Elasticsearch for indexing, again running as another service in same cluster.
While the required ports are open in both services, janusgraph pods' logs say its facing connection timeout while connecting to Cassandra.
23343 [main] WARN org.apache.tinkerpop.gremlin.server.GremlinServer - Graph [graph] configured at [conf/gremlin-server/janusgraph.properties] could not be instantiated and will not be available in Gremlin Server. GraphFactory message: GraphFactory could not instantiate this Graph implementation [class org.janusgraph.core.JanusGraphFactory]
java.lang.RuntimeException: GraphFactory could not instantiate this Graph implementation [class org.janusgraph.core.JanusGraphFactory]
at org.apache.tinkerpop.gremlin.structure.util.GraphFactory.open(GraphFactory.java:82)
at org.apache.tinkerpop.gremlin.structure.util.GraphFactory.open(GraphFactory.java:70)
at org.apache.tinkerpop.gremlin.structure.util.GraphFactory.open(GraphFactory.java:104)
at org.apache.tinkerpop.gremlin.server.util.DefaultGraphManager.lambda$new$0(DefaultGraphManager.java:57)
at java.util.LinkedHashMap$LinkedEntrySet.forEach(LinkedHashMap.java:671)
at org.apache.tinkerpop.gremlin.server.util.DefaultGraphManager.<init>(DefaultGraphManager.java:55)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.tinkerpop.gremlin.server.util.ServerGremlinExecutor.<init>(ServerGremlinExecutor.java:110)
at org.apache.tinkerpop.gremlin.server.util.ServerGremlinExecutor.<init>(ServerGremlinExecutor.java:89)
at org.apache.tinkerpop.gremlin.server.GremlinServer.<init>(GremlinServer.java:110)
at org.apache.tinkerpop.gremlin.server.GremlinServer.main(GremlinServer.java:354)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.tinkerpop.gremlin.structure.util.GraphFactory.open(GraphFactory.java:78)
... 13 more
Caused by: java.lang.IllegalArgumentException: Could not instantiate implementation: org.janusgraph.diskstorage.cassandra.astyanax.AstyanaxStoreManager
at org.janusgraph.util.system.ConfigurationUtil.instantiate(ConfigurationUtil.java:69)
at org.janusgraph.diskstorage.Backend.getImplementationClass(Backend.java:477)
at org.janusgraph.diskstorage.Backend.getStorageManager(Backend.java:409)
at org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration.<init>(GraphDatabaseConfiguration.java:1376)
at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:164)
at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:133)
at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:113)
... 18 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.janusgraph.util.system.ConfigurationUtil.instantiate(ConfigurationUtil.java:58)
... 24 more
Caused by: org.janusgraph.diskstorage.TemporaryBackendException: Temporary failure in storage backend
at org.janusgraph.diskstorage.cassandra.astyanax.AstyanaxStoreManager.ensureKeyspaceExists(AstyanaxStoreManager.java:619)
at org.janusgraph.diskstorage.cassandra.astyanax.AstyanaxStoreManager.<init>(AstyanaxStoreManager.java:314)
... 29 more
Caused by: com.netflix.astyanax.connectionpool.exceptions.PoolTimeoutException: PoolTimeoutException: [host=cassandra(SERVICE_IP):9160, latency=10001(10001), attempts=1]Timed out waiting for connection
at com.netflix.astyanax.connectionpool.impl.SimpleHostConnectionPool.waitForConnection(SimpleHostConnectionPool.java:231)
at com.netflix.astyanax.connectionpool.impl.SimpleHostConnectionPool.borrowConnection(SimpleHostConnectionPool.java:198)
at com.netflix.astyanax.connectionpool.impl.RoundRobinExecuteWithFailover.borrowConnection(RoundRobinExecuteWithFailover.java:84)
at com.netflix.astyanax.connectionpool.impl.AbstractExecuteWithFailoverImpl.tryOperation(AbstractExecuteWithFailoverImpl.java:117)
at com.netflix.astyanax.connectionpool.impl.AbstractHostPartitionConnectionPool.executeWithFailover(AbstractHostPartitionConnectionPool.java:352)
at com.netflix.astyanax.thrift.ThriftClusterImpl.executeSchemaChangeOperation(ThriftClusterImpl.java:146)
at com.netflix.astyanax.thrift.ThriftClusterImpl.internalCreateKeyspace(ThriftClusterImpl.java:321)
at com.netflix.astyanax.thrift.ThriftClusterImpl.addKeyspace(ThriftClusterImpl.java:294)
at org.janusgraph.diskstorage.cassandra.astyanax.AstyanaxStoreManager.ensureKeyspaceExists(AstyanaxStoreManager.java:614)
I am running janusgrah v2 image and gcr.io/google-samples/cassandra:v13 image for cassandra.
I tried connecting to cassandra port 9160 from busybox pod too. But doesn't seem to work. But the interesting thing is: ping
seems to work for the service name (cassandra here). But only when it gets to telnet
on port 9160 or 9042 i get connection refused error.
Here is cassandra STS:
apiVersion: v1
kind: Service
metadata:
labels:
app: cassandra
name: cassandra
spec:
clusterIP: None
ports:
- port: 9042
name: cql
- port: 9160
name: thrift
selector:
app: cassandra
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: cassandra
labels:
app: cassandra
spec:
serviceName: cassandra
replicas: 3
selector:
matchLabels:
app: cassandra
template:
metadata:
labels:
app: cassandra
spec:
terminationGracePeriodSeconds: 1800
#schedulerName: stork #Check benefits of using STORK as scheduler.
containers:
- name: cassandra
image: gcr.io/google-samples/cassandra:v13
ports:
- containerPort: 7000
name: intra-node
- containerPort: 7001
name: tls-intra-node
- containerPort: 7199
name: jmx
- containerPort: 9042
name: cql
- containerPort: 9160
name: thrift
- containerPort: 9142
name: transportssl
resources:
limits:
cpu: "1Gi"
memory: 2Gi
requests:
cpu: "500m"
memory: 1Gi
securityContext:
capabilities:
add:
- IPC_LOCK
lifecycle:
preStop:
exec:
command:
- /bin/sh
- -c
- nodetool drain
env:
- name: CASSANDRA_SEEDS
value: cassandra-0.cassandra.default.svc.cluster.local
- name: MAX_HEAP_SIZE
value: 512M
- name: HEAP_NEWSIZE
value: 512M
- name: CASSANDRA_CLUSTER_NAME
value: "Cassandra"
- name: CASSANDRA_DC
value: "DC1"
- name: CASSANDRA_RACK
value: "Rack1"
- name: CASSANDRA_AUTO_BOOTSTRAP
value: "false"
- name: CASSANDRA_ENDPOINT_SNITCH
value: GossipingPropertyFileSnitch
- name: CASSANDRA_RPC_ADDRESS
value: 0.0.0.0
- name: CASSANDRA_NUM_TOKENS
value: "32"
- name: POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
readinessProbe:
exec:
command:
- /bin/bash
- -c
- /ready-probe.sh
initialDelaySeconds: 15
timeoutSeconds: 5
volumeMounts:
- name: nfs-pvc-cassandra
mountPath: /srv/nfs/kubedata/janus
restartPolicy: Always
volumes:
- name: nfs-pvc-cassandra
persistentVolumeClaim:
claimName: nfs-pvc-cassandra
What could be the way i can debug this further?
If your janusgraph is running on your host machine, you maybe have to do a port-forward of your kubernetes services ports to access it in local. Maybe you already did it
Just update Storage.hostname: cassandra-0.cassandra.default
in janusgraph values.yaml
for janusgraph pod to communicate with cassandra.
Check thrift status on cassandra node using nodetool command nodetool statusThrift
if not enable , then enable it again using nodetool command (nodetool enablethrift)
As I can confirm that your StatefulSet yaml works just fine, and headless service creates dns name directing to endpoints of pods. I created simple nginx pod to telnet to service for checking it. Here is the output:
Check if endpoints and service exist for cassandra
$ kubectl get svc,ep cassandra
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/cassandra ClusterIP None <none> 9042/TCP,9160/TCP 2h
NAME ENDPOINTS AGE
endpoints/cassandra 10.56.0.10:9160,10.56.3.3:9160,10.56.4.2:9160 + 3 more... 2h
Exec to the neighbor pod in the same namespace and telnet to service and pods
$ kubectl exec -it nginx-79dbd67896-9dwp8 bash
root@nginx-79dbd67896-9dwp8:/# telnet cassandra 9042
Trying 10.56.3.3...
Connected to cassandra.default.svc.cluster.local.
Escape character is '^]'.
telnet> quit
Connection closed.
root@nginx-79dbd67896-9dwp8:/# telnet 10.56.0.10 9042
Trying 10.56.0.10...
Connected to 10.56.0.10.
Escape character is '^]'.
As it seems from output service and pods are listening on port 9042, NOT 9160, because Port 9160 is for Cassandra's Thrift API server, which is disabled by default. For more about this issue check https://github.com/docker-library/cassandra/issues/127 . You have to check how to enable Thrift API port.
You can check listening ports on cassandra pods, by exec-ing into one of this pod and running below commands:
root@cassandra-0:/# apt update && apt install telnet net-tools
<output omitted>
root@cassandra-0:/# netstat -tulpen
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State User Inode PID/Program name
tcp 0 0 0.0.0.0:9042 0.0.0.0:* LISTEN 1000 10163244 -
tcp 0 0 127.0.0.1:43669 0.0.0.0:* LISTEN 1000 10162974 -
tcp 0 0 10.56.0.10:7000 0.0.0.0:* LISTEN 1000 10163145 -
tcp 0 0 127.0.0.1:7199 0.0.0.0:* LISTEN 1000 10162973 -
Hope it helps!