i'm trying to deploy EFK stack on production kubernetes cluster (installed using kubespray), we have 3 nodes, 1 master + 2 workers, i need to use elasticsearch as a statefulset and use a local folder in master node to store logs (local storage for persistance), my configuration is :
kind: Namespace
apiVersion: v1
metadata:
name: kube-logging
---
kind: Service
apiVersion: v1
metadata:
name: elasticsearch
namespace: kube-logging
labels:
app: elasticsearch
spec:
selector:
app: elasticsearch
clusterIP: None
ports:
- port: 9200
name: rest
- port: 9300
name: inter-node
---
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: local-storage
namespace: kube-logging
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: my-pv
namespace: kube-logging
spec:
storageClassName: local-storage
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
hostPath:
path: /tmp/elastic
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: es-cluster
namespace: kube-logging
spec:
serviceName: elasticsearch
replicas: 2
selector:
matchLabels:
app: elasticsearch
template:
metadata:
labels:
app: elasticsearch
spec:
containers:
- name: elasticsearch
image: docker.elastic.co/elasticsearch/elasticsearch:7.2.0
resources:
limits:
cpu: 1000m
memory: 2Gi
ports:
- containerPort: 9200
name: rest
protocol: TCP
- containerPort: 9300
name: inter-node
protocol: TCP
volumeMounts:
- name: data
mountPath: /usr/share/elasticsearch/data
env:
- name: cluster.name
value: k8s-logs
- name: node.name
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: discovery.seed_hosts
value: "es-cluster-0.elasticsearch,es-cluster-1.elasticsearch,es-cluster-2.elasticsearch"
- name: cluster.initial_master_nodes
value: "es-cluster-0,es-cluster-1,es-cluster-2"
- name: ES_JAVA_OPTS
value: "-Xms512m -Xmx512m"
initContainers:
- name: fix-permissions
image: busybox
command: ["sh", "-c", "chown -R 1000:1000 /usr/share/elasticsearch/data"]
securityContext:
privileged: true
volumeMounts:
- name: data
mountPath: /usr/share/elasticsearch/data
- name: increase-vm-max-map
image: busybox
command: ["sysctl", "-w", "vm.max_map_count=262144"]
securityContext:
privileged: true
- name: increase-fd-ulimit
image: busybox
command: ["sh", "-c", "ulimit -n 65536"]
securityContext:
privileged: true
volumeClaimTemplates:
- metadata:
name: data
labels:
app: elasticsearch
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: local-storage
resources:
requests:
storage: 5Gi
---
so this was my configuration but when it's applied, one of the two pods for Elasticsearch still in pending status. when i did kubectl describe for this pod this is the error that i get: "1 node(s) didn't find available persistent volumes to bind"
is my configuration correct ? must i use PV + storageclass + volumeClaimTemplates ? thank you in advance.
Those are my outputs:
[root@node1 nex]# kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
my-pv 5Gi RWO Retain Bound kube-logging/data-es-cluster-0 local-storage 24m
[root@node1 nex]# kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
data-es-cluster-0 Bound my-pv 5Gi RWO local-storage 24m
data-es-cluster-1 Pending local-storage 23m
[root@node1 nex]# kubectl describe pvc data-es-cluster-0
Name: data-es-cluster-0
Namespace: kube-logging
StorageClass: local-storage
Status: Bound
Volume: my-pv
Labels: app=elasticsearch
Annotations: pv.kubernetes.io/bind-completed: yes
pv.kubernetes.io/bound-by-controller: yes
Finalizers: [kubernetes.io/pvc-protection]
Capacity: 5Gi
Access Modes: RWO
VolumeMode: Filesystem
Mounted By: es-cluster-0
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal WaitForFirstConsumer 24m persistentvolume-controller waiting for first consumer to be created before binding
[root@node1 nex]# kubectl describe pvc data-es-cluster-1
Name: data-es-cluster-1
Namespace: kube-logging
StorageClass: local-storage
Status: Pending
Volume:
Labels: app=elasticsearch
Annotations: <none>
Finalizers: [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode: Filesystem
Mounted By: es-cluster-1
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal WaitForFirstConsumer 4m12s (x82 over 24m) persistentvolume-controller waiting for first consumer to be created before binding
[root@node1 nex]#
is my configuration correct ? must i use PV + storageclass + volumeClaimTemplates ? thank you in advance.
Apart from what @Arghya Sadhu already suggested in his answer, I'd like to highlight one more thing in your current setup.
If you're ok with the fact that your Elasticsearch Pods
will be scheduled only on one particular node (in your case your master node), you can still use the local volume type. Don't confuse it however with hostPath. I noticed in your PV
definition that you used hostPath
key so chances are that you're not completely aware of the differences between this two concepts. Although they are quite similar, local type has bigger capabilities and some undeniable advantages over hostPath
.
As you can read in documentation:
A local volume represents a mounted local storage device such as a disk, partition or directory.
So it means that apart from specific directory you're also able to mount local disk or partition (/dev/sdb
, /dev/sdb5
etc.). It can be e.g. an LVM partition with strictly defined capacity. Keep in mind that when it comes to mounting a local directory you are not able to enforce the capacity that can be actually used, so even if you define let's say 5Gi
, logs can be written to your local directory even if this value is exceeded. But it's not the case with logical volume
as you're able to define it's capacity and make sure it won't use more disk space than you gave it.
Second difference is that:
Compared to
hostPath
volumes,local
volumes can be used in a durable and portable manner without manually scheduling Pods to nodes, as the system is aware of the volume’s node constraints by looking at the node affinity on the PersistentVolume.
In this case it is the PersistentVolume
where you define your node affinity, so any Pod
(it can be Pod
managed by your StatefulSet
) which uses subsequently local-storage
storage class and corresponding PersistenVolume
will be automatically scheduled on the right node.
As you can read further, nodeAffinity
is actually the required field in such PV
:
PersistentVolume
nodeAffinity
is required when using local volumes. It enables the Kubernetes scheduler to correctly schedule Pods using local volumes to the correct node.
As far as I understand, your kubernetes cluster is set up locally/on-premise. In this case NFS could be a right choice.
If you used some cloud environment then you could use persistent storage offered by your particular cloud provider e.g. GCEPersistentDisk
or AWSElasticBlockStore
. The full list of persistent volume types currently supported by kubernetes you can find here.
So again, If you're concerned about node-level redundancy in your StatefulSet
and you would like your 2 Elasticsearch Pods
to be scheduled always on different nodes, as @Arghya Sadhu already suggested, use NFS or some other non-local storage.
However if you're not concerned about node-level redundancy and you're totally ok with fact that both your Elasticsearch Pods
are running on the same node (master node in your case), please follow me :)
As @Arghya Sadhu rightly pointed out:
Even if a PV which is already bound to a PVC have spare capacity it can not be again bound to another PVC because it's one to one mapping between PV and PVC.
Although it's always one to one mapping between PV
and PVC
, it doesn't mean you cannot use a single PVC
in many Pods.
Note, that in your StatefulSet
example you used volumeClaimTemplates
which basically means that each time when a new Pod
managed by your StatefulSet
is created, also a new corresponding PersistentVolumeClaim
is created based on this template. So if you have e.g. 10Gi
PersistentVolume
defined, no matter if you request in your claim all 10Gi
or only half of it, only first PVC
will be successfully bound to your PV
.
But instead of using volumeClaimTemplates
and creating a separate PVC
for every stateful Pod
you can make them use a single, manually defined PVC
. Please take a look at the following example:
First thing we need is a storage class. It looks quite similar to the one in your exaple:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: local-storage
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer
First difference between this setup and yours is in PV
definition. Instead of hostPath
we're using here local volume:
apiVersion: v1
kind: PersistentVolume
metadata:
name: example-pv
spec:
capacity:
storage: 10Gi
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Delete
storageClassName: local-storage
local:
path: /var/tmp/test ### path on your master node
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- your-master-node-name
Note that apart from defining local
path, we also defined nodeAffinity
rule that makes sure that all Pods
which get this particular PV
will be automatically scheduled on our master node.
Then we have our manually applied PVC
:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: myclaim
spec:
accessModes:
- ReadWriteOnce
volumeMode: Filesystem
resources:
requests:
storage: 10Gi
storageClassName: local-storage
This PVC
can now be used by all (in your example 2) Pods
managed by StatefulSet
:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: web
spec:
selector:
matchLabels:
app: nginx # has to match .spec.template.metadata.labels
serviceName: "nginx"
replicas: 2 # by default is 1
template:
metadata:
labels:
app: nginx # has to match .spec.selector.matchLabels
spec:
terminationGracePeriodSeconds: 10
containers:
- name: nginx
image: k8s.gcr.io/nginx-slim:0.8
ports:
- containerPort: 80
name: web
volumeMounts:
- name: mypd
mountPath: /usr/share/nginx/html
volumes:
- name: mypd
persistentVolumeClaim:
claimName: myclaim
Note that in the above example we don't use volumeClaimTemplates
any more but a single PersistentVolumeClaim
which can be used by all our Pods. Pods are still unique as they are managed by a StatefulSet
but instead of using unique PVCs
, they use common one. Thanks to this approach both Pods
can write logs to a single volume at the same time.
In my example I used the nginx server to make the replication as easy as possible for everyone who wants to try it out quickly but I believe you can easily adjust it to your needs.