Prometheus Server Pod with high load is frecuently evicted (how to fix)

Prometheus Server Pod with high load is frecuently evicted (how to fix)

Recently I deployed Prometheus, Grafana, Alertmanager and PushGateway using the official helm chart. In this case the k8s cluster is production, so that tls is required. By the same way I have many clusters with Helm 2.x in production and some modifications was required to use differents private keys and certificates. At last I rewrite my .bashrc and others files to include something like this (only relevant sections are showed):

```
# ~/.bashrc
....
# https://github.com/ahmetb/kubectl-aliases
[ -f ~/.kubectl_aliases ] && source ~/.kubectl_aliases
....
# kubernetes specific
export KUBE_EDITOR="nano"
source <(kubectl completion bash) # setup autocomplete in bash into the current shell, bash-completion package should be installed first.
....
```

Other file:
```
# ~/.bash_aliases

# note for cluster AKS
# Error: incompatible versions client[v2.16.1] server[v2.13.1]
# https://github.com/helm/helm/releases/tag/v2.13.1
# https://get.helm.sh/helm-v2.13.1-linux-amd64.tar.gz

# https://medium.com/nuvo-group-tech/configure-helm-tls-communication-with-multiple-kubernetes-clusters-5e58674352e2
alias tls='cluster=$(kubectl config view -o jsonpath='{.clusters[].name}' --minify); echo -n "--tls --tls-cert $(helm home)/tls/$cluster/cert.pem --tls-key $(helm home)/tls/$cluster/key.pem"'
function helmet() {
    helm "$@" $(tls)
}

```

My ~/.helm folder:

```
.helm/
├── cache
│   └── archive
│       ├── apache-6.0.3.tgz
│       ├── cert-manager-v0.11.0.tgz
│       ├── cert-manager-v0.8.0.tgz
│       ├── external-dns-2.9.0.tgz
│       ├── grafana-3.8.3.tgz
│       ├── grafana-4.0.0.tgz
│       ├── jenkins-1.5.4.tgz
│       ├── kubernetes-dashboard-1.5.2.tgz
│       ├── loki-0.17.0.tgz
│       ├── loki-0.22.0.tgz
│       ├── loki-stack-0.16.0.tgz
│       ├── loki-stack-0.17.0.tgz
│       ├── mariadb-6.2.2.tgz
│       ├── mysql-0.19.0.tgz
│       ├── mysql-1.3.0.tgz
│       ├── mysql-1.4.0.tgz
│       ├── nginx-ingress-1.24.3.tgz
│       ├── nginx-ingress-1.24.4.tgz
│       ├── nginx-ingress-1.24.5.tgz
│       ├── nginx-ingress-1.24.6.tgz
│       ├── prometheus-operator-6.0.0.tgz
│       ├── prometheus-operator-6.21.0.tgz
│       ├── prometheus-operator-6.21.1.tgz
│       ├── prometheus-operator-6.7.2.tgz
│       ├── promtail-0.13.0.tgz
│       ├── promtail-0.16.0.tgz
        ......
│       └── wordpress-5.9.8.tgz
├── plugins
├── repository
│   ├── cache
│   │   ├── bitnami-index.yaml
│   │   ├── jetstack-index.yaml
│   │   ├── local-index.yaml
│   │   ├── loki-index.yaml
│   │   └── stable-index.yaml
│   ├── local
│   │   └── index.yaml
│   └── repositories.yaml
├── starters
└── tls
    ├── MyFirstCluster
    │   ├── ca.pem
    │   ├── cert.pem
    │   └── key.pem
    └── MySecondCluster
        ├── ca.pem
        ├── cert.pem
        └── key.pem
    ......

```

Overview

The initial Prometheus deployment was made using the Helm chart.

```
$ helm repo update
$ helmet update --name=prometheus prometheus \
--namespace monitoring \
--set rbac.create=true \
--set server.persistentVolume.enabled=true \
--set server.persistentVolume.size=20Gi \
--set server.persistentVolume.storageClass=managed-premium \
--set alertmanager.persistentVolume.enabled=true \
--set alertmanager.persistentVolume.size=20Gi \
--set alertmanager.persistentVolume.storageClass=managed-premium \
--set pushgateway.persistentVolume.enabled=true \
--set pushgateway.persistentVolume.size=20Gi \
--set pushgateway.persistentVolume.storageClass=managed-premium \
--set server.terminationGracePeriodSeconds=360
```

Some PersistVolumeClaim was created. All pvc are from 20 GiB. All systems works fine, but after one month, the metrics count arise at high volumes and when the prometheus server pod is overload, kubernetes kill him because the original liveness and readiness probes exceed the threshold.

We need to adjust the kubernetes probes.

Deployment review

The original deployment for prometheus-server is:

```
$ k -n monitoring edit deployment prometheus-server
# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
annotations:
    deployment.kubernetes.io/revision: "1"
creationTimestamp: "2019-10-30T00:50:30Z"
generation: 3
labels:
    app: prometheus
    chart: prometheus-9.2.0
    component: server
    heritage: Tiller
    release: prometheus
name: prometheus-server
namespace: monitoring
resourceVersion: "13955013"
selfLink: /apis/extensions/v1beta1/namespaces/monitoring/deployments/prometheus-server
uid: 4561936a-faaf-11e9-b365-4aa5ceef3b39
spec:
progressDeadlineSeconds: 2147483647
replicas: 1
revisionHistoryLimit: 2147483647
selector:
    matchLabels:
      app: prometheus
      component: server
      release: prometheus
strategy:
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 1
    type: RollingUpdate
template:
    metadata:
      creationTimestamp: null
      labels:
        app: prometheus
        chart: prometheus-9.2.0
        component: server
        heritage: Tiller
        release: prometheus
    spec:
      containers:
      - args:
        - --volume-dir=/etc/config
        - --webhook-url=http://127.0.0.1:9090/-/reload
        image: jimmidyson/configmap-reload:v0.2.2
        imagePullPolicy: IfNotPresent
        name: prometheus-server-configmap-reload
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /etc/config
          name: config-volume
          readOnly: true
      - args:
        - --storage.tsdb.retention.time=15d
        - --config.file=/etc/config/prometheus.yml
        - --storage.tsdb.path=/data
        - --web.console.libraries=/etc/prometheus/console_libraries
        - --web.console.templates=/etc/prometheus/consoles
        - --web.enable-lifecycle
        image: prom/prometheus:v2.13.1
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /-/healthy
            port: 9090
            scheme: HTTP
          initialDelaySeconds: 30
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 30
        name: prometheus-server
        ports:
        - containerPort: 9090
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /-/ready
            port: 9090
            scheme: HTTP
          initialDelaySeconds: 30
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 30
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /etc/config
          name: config-volume
        - mountPath: /data
         name: storage-volume
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext:
        fsGroup: 65534
        runAsGroup: 65534
        runAsNonRoot: true
        runAsUser: 65534
      serviceAccount: prometheus-server
      serviceAccountName: prometheus-server
      terminationGracePeriodSeconds: 360
      volumes:
      - configMap:
          defaultMode: 420
          name: prometheus-server
        name: config-volume
      - name: storage-volume
        persistentVolumeClaim:
          claimName: prometheus-server
status:
conditions:
- lastTransitionTime: "2019-10-30T00:50:30Z"
    lastUpdateTime: "2019-10-30T00:50:30Z"
    message: Deployment has minimum availability.
    reason: MinimumReplicasAvailable
    status: "True"
    type: Available
observedGeneration: 3
replicas: 1
unavailableReplicas: 1
updatedReplicas: 1
```

The main sections here are:

```
....
livenessProbe:
failureThreshold: 3
httpGet:
    path: /-/healthy
    port: 9090
    scheme: HTTP
initialDelaySeconds: 30
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 30
name: prometheus-server
ports:
- containerPort: 9090
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
    path: /-/ready
    port: 9090
    scheme: HTTP
initialDelaySeconds: 30
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 30
....
```
Now, after sixty days, the TSDB has more than 5 millon of rows and take more than six minutes to be fully up and responsive.

Fixes

To know about the prometheus-server pod:

```
$ k -n monitoring get pods
NAME                                                     READY   STATUS    RESTARTS   AGE
grafana-676f46565c-tqpzl                                 1/1     Running   0          39d
grafana-nginx-ingress-controller-5778fc5dcb-7vchz        1/1     Running   0          60d
grafana-nginx-ingress-controller-5778fc5dcb-kkmml        1/1     Running   0          60d
grafana-nginx-ingress-default-backend-7f879557f8-zvkm8   1/1     Running   0          60d
prometheus-alertmanager-788958f7c7-7rgdx                 2/2     Running   0          61d
prometheus-kube-state-metrics-55fb55b9db-8gmqt           1/1     Running   0          59d
prometheus-node-exporter-cqlql                           1/1     Running   0          61d
prometheus-node-exporter-k4xqf                           1/1     Running   0          61d
prometheus-node-exporter-p8cpj                           1/1     Running   0          61d
prometheus-pushgateway-699f55c47-8v7jq                   1/1     Running   0          61d
prometheus-server-745f77d49b-v77ll                       2/2     Running   3          154m

$ k -n monitoring describe pod prometheus-server-745f77d49b-v77ll
Name:           prometheus-server-745f77d49b-v77ll
Namespace:      monitoring
Priority:       0
Node:           aks-nodepool1-20238707-0/10.244.0.4
Start Time:     Sun, 29 Dec 2019 21:35:57 -0400
Labels:         app=prometheus
                chart=prometheus-9.2.0
                component=server
                heritage=Tiller
                pod-template-hash=745f77d49b
                release=prometheus
Annotations:    <none>
Status:         Running
IP:             10.244.0.7
IPs:            <none>
Controlled By: ReplicaSet/prometheus-server-745f77d49b
Containers:
prometheus-server-configmap-reload:
    Container ID: docker://3237d4bf6957a76ba67021dc25da55475646041bd9b5829e3113402c9457f1f5
    Image:         jimmidyson/configmap-reload:v0.2.2
    Image ID:      docker-pullable://jimmidyson/configmap-reload@sha256:befec9f23d2a9da86a298d448cc9140f56a457362a7d9eecddba192db1ab489e
    Port:          <none>
    Host Port:     <none>
    Args:
      --volume-dir=/etc/config
      --webhook-url=http://127.0.0.1:9090/-/reload
    State:          Running
      Started:      Sun, 29 Dec 2019 21:36:04 -0400
    Ready:          True
    Restart Count: 0
    Environment:    <none>
    Mounts:
      /etc/config from config-volume (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from prometheus-server-token-v4jg2 (ro)
prometheus-server:
    Container ID: docker://87cec70c5bc65ad2db5d2e0b69b90c256a8a3c7cd56383bb08e15c486f91ffeb
    Image:         prom/prometheus:v2.13.1
    Image ID:      docker-pullable://prom/prometheus@sha256:0a8caa2e9f19907608915db6e62a67383fe44b9876a467b297ee6f64e51dd58a
    Port:          9090/TCP
    Host Port:     0/TCP
    Args:
      --storage.tsdb.retention.time=15d
      --config.file=/etc/config/prometheus.yml
      --storage.tsdb.path=/data
      --web.console.libraries=/etc/prometheus/console_libraries
      --web.console.templates=/etc/prometheus/consoles
      --web.enable-lifecycle
    State:          Running
      Started:      Sun, 29 Dec 2019 21:36:55 -0400
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Sun, 29 Dec 2019 21:36:24 -0400
      Finished:     Sun, 29 Dec 2019 21:36:25 -0400
    Ready:          True
    Restart Count: 3
    Liveness:       http-get http://:9090/-/healthy delay=600s timeout=1s period=30s #success=1 #failure=3
    Readiness:      http-get http://:9090/-/ready delay=600s timeout=1s period=30s #success=1 #failure=20
    Environment:    <none>
    Mounts:
      /data from storage-volume (rw)
      /etc/config from config-volume (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from prometheus-server-token-v4jg2 (ro)
Conditions:
Type              Status
Initialized       True
Ready             True
ContainersReady   True
PodScheduled      True
Volumes:
config-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      prometheus-server
    Optional: false
storage-volume:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName: prometheus-server
    ReadOnly:   false
prometheus-server-token-v4jg2:
    Type:        Secret (a volume populated by a Secret)
    SecretName: prometheus-server-token-v4jg2
    Optional:    false
QoS Class:       BestEffort
Node-Selectors: <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:          <none>
```

To check the logs of the prometheus server container:

```
$ kubectl -n monitoring logs -f prometheus-server-745f77d49b-v77ll -c prometheus-server
level=info ts=2019-12-30T01:53:14.999Z caller=compact.go:441 component=tsdb msg="compact blocks" count=3 mint=1577037600000 maxt=1577059200000 ulid=01DXA83MACD5PP4XHX7TC2K0BS sources="[01DXA7GB0XH5E69931MBAXWCPH 01DXA7GZP72WW93TYKKZDX35ZD 01DXA7HKN5BZF5AK1GTVZWK5M9]" duration=3.499215571s
level=info ts=2019-12-30T01:53:20.402Z caller=compact.go:441 component=tsdb msg="compact blocks" count=3 mint=1577059200000 maxt=1577080800000 ulid=01DXA83SCFC0HQDWSNYXK4TXVV sources="[01DXA7J7H1ENJS14EFGBNYE8P2 01DXA7JT9J72SX1PCVWD23HQFT 01DXA7KDEPGQ6MRT8AXQSVYSJ0]" duration=3.71530483s
level=info ts=2019-12-30T01:53:25.600Z caller=compact.go:441 component=tsdb msg="compact blocks" count=3 mint=1577080800000 maxt=1577102400000 ulid=01DXA83YJMBA7759C5QD1CEDWJ sources="[01DXA7M0EKGW2V92QZQSFFCAYJ 01DXA7MK45405TASTFHVJT02Q6 01DXA7MZR8EAB12T2C87EXWK7B]" duration=3.595686298s
level=info ts=2019-12-30T01:53:30.706Z caller=compact.go:441 component=tsdb msg="compact blocks" count=3 mint=1577102400000 maxt=1577124000000 ulid=01DXA843MY2J47EZ8WMADK7KWT sources="[01DXA7NA5P8G0ZKPYWKBY5MJ8G 01DXA7NGB1GNMA89QYMESVSGX3 01DXA7NTDZ65T5F794175R38M5]" duration=3.507732636s
level=info ts=2019-12-30T01:53:34.014Z caller=compact.go:441 component=tsdb msg="compact blocks" count=3 mint=1577124000000 maxt=1577145600000 ulid=01DXA848KKMPQQR3KR71AH5XDS sources="[01DXA7P04K1JXAQTNKKV4NY0D1 01DXA7P8N72MA1E70W3AREZP4N 01DXA7PCD44Q2FRC57MM438AF9]" duration=1.738807856s
level=info ts=2019-12-30T01:55:57.170Z caller=compact.go:441 component=tsdb msg="compact blocks" count=3 mint=1576972800000 maxt=1577037600000 ulid=01DXA84CAGGD83G33ZHWVXE09P sources="[01DWPDQBAK8Y3VNP2V4G2K4A8W 01DXA80KFNEJM5VRX6ECXZAZ39 01DXA8333RXBRJY4NBJRADJC99]" duration=2m21.08970652s
level=info ts=2019-12-30T01:56:04.508Z caller=compact.go:441 component=tsdb msg="compact blocks" count=3 mint=1577037600000 maxt=1577102400000 ulid=01DXA88R0Q37C0643WKG7VM38Z sources="[01DXA83MACD5PP4XHX7TC2K0BS 01DXA83SCFC0HQDWSNYXK4TXVV 01DXA83YJMBA7759C5QD1CEDWJ]" duration=5.38087382s
level=info ts=2019-12-30T03:00:03.264Z caller=compact.go:496 component=tsdb msg="write block" mint=1577664000000 maxt=1577671200000 ulid=01DXABXYZCSJF71KZ4DWHP2SR5 duration=3.155977955s
level=info ts=2019-12-30T03:00:04.360Z caller=head.go:598 component=tsdb msg="head GC completed" duration=209.807162ms
level=info ts=2019-12-30T03:00:08.474Z caller=head.go:668 component=tsdb msg="WAL checkpoint complete" first=2703 last=2704 duration=4.113514824s
level=info ts=2019-12-30T03:00:12.754Z caller=compact.go:441 component=tsdb msg="compact blocks" count=3 mint=1577102400000 maxt=1577152800000 ulid=01DXABY7XS038KHWDTZ83DPPZ9 sources="[01DXA843MY2J47EZ8WMADK7KWT 01DXA848KKMPQQR3KR71AH5XDS 01DXA7PK4TV7BBRA81W59QH8WS]" duration=3.481380438s
.....
```

To kill unresponsive pod:
```
$ kubectl -n monitoring delete pod prometheus-server-745f77d49b-gcfzt --force --grace-period 0
```

The solution found with trial and errors

Edit the prometheuse-server deployment and adjust the probes

```
$ kubectl -n monitoring edit deployment prometheus-server

```
....
livenessProbe:
failureThreshold: 3
httpGet:
    path: /-/healthy
    port: 9090
    scheme: HTTP
initialDelaySeconds: 600
periodSeconds: 30
successThreshold: 1
timeoutSeconds: 30
name: prometheus-server
ports:
- containerPort: 9090
protocol: TCP
readinessProbe:
failureThreshold: 20
httpGet:
    path: /-/ready
    port: 9090
    scheme: HTTP
initialDelaySeconds: 600
periodSeconds: 30
successThreshold: 1
timeoutSeconds: 30
....
```

After this modifications, the prometheus-server pod is working fine and no more Evicted pods here.

Note:

If you need delete Evicted pods in your current k8s:

```
$ cat delete-evicted-pods-all-namespaces.sh
#!/bin/sh
# based on https://gist.github.com/ipedrazas/9c622404fb41f2343a0db85b3821275d

# delete all evicted pods from all namespaces
kubectl get pods --all-namespaces | grep Evicted | awk '{print $2 " --namespace=" $1}' | xargs kubectl delete pod

# delete all containers in ImagePullBackOff state from all namespaces
kubectl get pods --all-namespaces | grep 'ImagePullBackOff' | awk '{print $2 " --namespace=" $1}' | xargs kubectl delete pod

# delete all containers in ImagePullBackOff or ErrImagePull or Evicted state from all namespaces
kubectl get pods --all-namespaces | grep -E 'ImagePullBackOff|ErrImagePull|Evicted' | awk '{print $2 " --namespace=" $1}' | xargs kubectl delete pod

```

Thanks for reading :)

Seek & Find

Search This Blog