How to upgrade Gitlab Omnibus to major version. Recently I need upgrade my self-hosted Gitlab Community from version 13.12.4-ce.0 to latest version 14.3.2-ce.0, but a simple apt update && apt-dist-upgrade was a nightmare ;/ Solution was at my eyes !!! Upgrading GitLab is a relatively straightforward process, but the complexity can increase based on the installation method you have used, how old your GitLab version is, if you’re upgrading to a major version, and so on. ;( References: https://docs.gitlab.com/ee/update/package/ https://docs.gitlab.com/ee/update/index.html#checking-for-background-migrations-before-upgrading Hands on ! First I need to know my exactly current Gitlab version root@gitlab:# dpkg -al|grep gitlab ii gitlab-ce 13.12.4-ce.0 amd64 GitLab Community Edition (including NGINX, Postgres, Redis) Now, what new versions are availables: # apt update # apt-cache madison gitlab-ce gitlab-ce | 14.3.2-ce.0 |
How fix full disk for /data in Prometheus Server deployed with Helm chart
The prometheus-server pod has two containers: prometheus-server-configmap-reload and prometheus-server.
Currently the prometheus-server has one disk of 20GiB and in was full in sixty days. We need to resize or change this pvc for at least 60GiB.
In order to do this, we need:
- the prometheus-server need to be stopped
- backup current volume data in another pvc
- delete current prometheus-server pvc
- recreate the previous prometheus pvc
- restore previous backup on this new prometheus-server pvc
- start the previous stopped deployment of prometheus-sever
There are the details: (the "k" is one alias for "kubectl")
1. the prometheus-server need to be stopped
---we need to get information for current prometheus-server deployment
```
$ k -n monitoring get deployment
NAME READY UP-TO-DATE AVAILABLE AGE
grafana 1/1 1 1 60d
grafana-nginx-ingress-controller 2/2 2 2 60d
grafana-nginx-ingress-default-backend 1/1 1 1 60d
prometheus-alertmanager 1/1 1 1 60d
prometheus-kube-state-metrics 1/1 1 1 60d
prometheus-pushgateway 1/1 1 1 60d
prometheus-server 0/1 1 0 60d
```
Check details for prometheus-server deployment:
```
$ k -n monitoring describe deployment prometheus-server
Name: prometheus-server
Namespace: monitoring
CreationTimestamp: Tue, 29 Oct 2019 20:50:30 -0400
Labels: app=prometheus
chart=prometheus-9.2.0
component=server
heritage=Tiller
release=prometheus
Annotations: deployment.kubernetes.io/revision: 1
Selector: app=prometheus,component=server,release=prometheus
Replicas: 1 desired | 1 updated | 1 total | 0 available | 1 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 1 max unavailable, 1 max surge
Pod Template:
Labels: app=prometheus
chart=prometheus-9.2.0
component=server
heritage=Tiller
release=prometheus
Service Account: prometheus-server
Containers:
prometheus-server-configmap-reload:
Image: jimmidyson/configmap-reload:v0.2.2
Port: <none>
Host Port: <none>
Args:
--volume-dir=/etc/config
--webhook-url=http://127.0.0.1:9090/-/reload
Environment: <none>
Mounts:
/etc/config from config-volume (ro)
prometheus-server:
Image: prom/prometheus:v2.13.1
Port: 9090/TCP
Host Port: 0/TCP
Args:
--storage.tsdb.retention.time=15d
--config.file=/etc/config/prometheus.yml
--storage.tsdb.path=/data
--web.console.libraries=/etc/prometheus/console_libraries
--web.console.templates=/etc/prometheus/consoles
--web.enable-lifecycle
Liveness: http-get http://:9090/-/healthy delay=30s timeout=30s period=10s #success=1 #failure=3
Readiness: http-get http://:9090/-/ready delay=30s timeout=30s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/data from storage-volume (rw)
/etc/config from config-volume (rw)
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: prometheus-server
Optional: false
storage-volume:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: prometheus-server
ReadOnly: false
Conditions:
Type Status Reason
---- ------ ------
Available True MinimumReplicasAvailable
OldReplicaSets: prometheus-server-65d76f67cf (1/1 replicas created)
NewReplicaSet: <none>
Events: <none>
```
We don't need to delete prometheus-server deployment, instead we set prometheus-server replicas to zero and this action delete current pod and release current volumes.
```
$ kubectl -n monitoring scale deployment prometheus-server --replicas=0
deployment.extensions/prometheus-server scaled
```
2. backup current volume data in another pvc
---Check for current pvc for prometheus-server
```
$ k -n monitoring get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
alpinebox-recovery-pvc Bound pvc-c86f431e-2a3b-11ea-9bfe-524e272515fe 30Gi RWO managed-premium 19m
grafana Bound pvc-822d1613-fab6-11e9-b365-4aa5ceef3b39 20Gi RWO managed-premium 60d
prometheus-alertmanager Bound pvc-44e4eedd-faaf-11e9-b365-4aa5ceef3b39 10Gi RWO managed-premium 60d
prometheus-pushgateway Bound pvc-44e6c0d8-faaf-11e9-b365-4aa5ceef3b39 10Gi RWO managed-premium 60d
prometheus-server Bound pvc-44eae0e4-faaf-11e9-b365-4aa5ceef3b39 20Gi RWO managed-premium 60d
$ k -n monitoring describe pvc prometheus-server
Name: prometheus-server
Namespace: monitoring
StorageClass: managed-premium
Status: Bound
Volume: pvc-44eae0e4-faaf-11e9-b365-4aa5ceef3b39
Labels: app=prometheus
chart=prometheus-9.2.0
component=server
heritage=Tiller
release=prometheus
Annotations: pv.kubernetes.io/bind-completed: yes
pv.kubernetes.io/bound-by-controller: yes
volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/azure-disk
Finalizers: [kubernetes.io/pvc-protection]
Capacity: 20Gi
Access Modes: RWO
VolumeMode: Filesystem
Mounted By: alpinebox
Events: <none>
$ k -n monitoring get pvc prometheus-server -o yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations:
pv.kubernetes.io/bind-completed: "yes"
pv.kubernetes.io/bound-by-controller: "yes"
volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/azure-disk
creationTimestamp: "2019-10-30T00:50:29Z"
finalizers:
- kubernetes.io/pvc-protection
labels:
app: prometheus
chart: prometheus-9.2.0
component: server
heritage: Tiller
release: prometheus
name: prometheus-server
namespace: monitoring
resourceVersion: "1457021"
selfLink: /api/v1/namespaces/monitoring/persistentvolumeclaims/prometheus-server
uid: 44eae0e4-faaf-11e9-b365-4aa5ceef3b39
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 20Gi
storageClassName: managed-premium
volumeMode: Filesystem
volumeName: pvc-44eae0e4-faaf-11e9-b365-4aa5ceef3b39
status:
accessModes:
- ReadWriteOnce
capacity:
storage: 20Gi
phase: Bound
```
as you can see, the current disk size is 20 GiB. Now I'll create a new pvc with more than 20 GiB.
```
# alpinebox-recovery-pvc.yml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: alpinebox-recovery-pvc
namespace: monitoring
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 30Gi
storageClassName: managed-premium
$ kubectl apply -f alpinebox-recovery-pvc.yml
```
Now we need a busybox with Alpine where rsync is available.
```
# alpinebox.yml
apiVersion: v1
kind: Pod
metadata:
name: alpinebox
namespace: monitoring
spec:
containers:
- name: alpinebox
image: alpine:3.5
command:
- sleep
- "3600"
volumeMounts:
- mountPath: /data-old
name: storage-volume-old
- mountPath: /data-new
name: storage-volume-new
restartPolicy: Never
volumes:
- name: storage-volume-old
persistentVolumeClaim:
claimName: prometheus-server
- name: storage-volume-new
persistentVolumeClaim:
claimName: alpinebox-recovery-pvc
$ kubectl apply -f alpinebox.yml
```
you can see that previous prometheus-server pvc will be available as */data-old* and the new recovery empty pvc disk will be available at */data-new*
now we need to sync */data-old* with empty */data-new* mountpoints.
```
$ k -n monitoring exec -it alpinebox sh
apk update
apk install rsync
rsync -avzHS --progress /data-old/* /data-new/
# wait near than 20 minutes to complete
exit
```
The actual data from prometheus-server database has been saved at new recovery pvc. we can delete this old pvc.
3. delete current prometheus-server pvc
---The data has been saved. we need to remove and re-create prometheus-server pvc with new size.
```
$ kubectl -n monitoring delete pvc prometheus-server
```
4. recreate prometheus pvc
---```
# prometheus-server-pvc.yml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
labels:
app: prometheus
chart: prometheus-9.2.0
component: server
heritage: Tiller
release: prometheus
name: prometheus-server
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 60Gi
storageClassName: managed-premium
$ kubectl -n monitoring apply -f prometheus-server-pvc.yml
```
5. restore previous backup on this new prometheus-server pvc
---Now we need to restore backup data to new prometheus-server pvc
```
$ k apply -f alpinebox.yaml
$ k -n monitoring exec -it alpinebox sh
apk update
apk install rsync
rsync -avzHS --progress /data-new/* /data-old/
# wait near than 20 minutes to complete
exit
$ k delete -f alpinebox.yaml
```
6. start the previous stopped deployment of prometheus-sever
---```
$ kubectl -n monitoring scale deployment prometheus-server --replicas=1
$ kubectl -n monitoring logs -f prometheus-server-65d76f67cf-7htdv -c prometheus-server
level=info ts=2019-12-29T22:09:57.517Z caller=main.go:332 msg="Starting Prometheus" version="(version=2.13.1, branch=HEAD, revision=6f92ce56053866194ae5937012c1bec40f1dd1d9)"
level=info ts=2019-12-29T22:09:57.517Z caller=main.go:333 build_context="(go=go1.13.1, user=root@88e419aa1676, date=20191017-13:15:01)"
level=info ts=2019-12-29T22:09:57.517Z caller=main.go:334 host_details="(Linux 4.15.0-1059-azure #64-Ubuntu SMP Fri Sep 13 17:02:44 UTC 2019 x86_64 prometheus-server-65d76f67cf-7htdv (none))"
level=info ts=2019-12-29T22:09:57.518Z caller=main.go:335 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2019-12-29T22:09:57.518Z caller=main.go:336 vm_limits="(soft=unlimited, hard=unlimited)"
level=info ts=2019-12-29T22:09:57.530Z caller=web.go:450 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=info ts=2019-12-29T22:09:57.530Z caller=main.go:657 msg="Starting TSDB ..."
level=info ts=2019-12-29T22:09:57.566Z caller=repair.go:59 component=tsdb msg="found healthy block" mint=1575676800000 maxt=1575741600000 ulid=01DVH2NWPSVX5Y5VQ7HA5SXESY
level=info ts=2019-12-29T22:09:57.569Z caller=repair.go:59 component=tsdb msg="found healthy block" mint=1575741600000 maxt=1575806400000 ulid=01DVK0FFTNN6BX0W6W3TQECMZY
level=info ts=2019-12-29T22:09:57.573Z caller=repair.go:59 component=tsdb msg="found healthy block" mint=1575806400000 maxt=1575871200000 ulid=01DVMY8TZJ1HZZYD371EDZJ5CZ
level=info ts=2019-12-29T22:09:57.576Z caller=repair.go:59 component=tsdb msg="found healthy block" mint=1575871200000 maxt=1575936000000 ulid=01DVPW2KCC86JEHD9YPHCA1DW2
level=info ts=2019-12-29T22:09:57.579Z caller=repair.go:59 component=tsdb msg="found healthy block" mint=1575936000000 maxt=1576000800000 ulid=01DVRSW4GZRKT9BF2VH7TD1SNF
.......
```
If all is ok, you can delete temporal recovery pvc
```
$ kubectl -n monitoring get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
alpinebox-recovery-pvc Bound pvc-c86f431e-....-11ea-9bfe-524e272515fe 30Gi RWO managed-premium 9h
grafana Bound pvc-822d1613-....-11e9-b365-4aa5ceef3b39 20Gi RWO managed-premium 60d
prometheus-alertmanager Bound pvc-44e4eedd-....-11e9-b365-4aa5ceef3b39 10Gi RWO managed-premium 60d
prometheus-pushgateway Bound pvc-44e6c0d8-....-11e9-b365-4aa5ceef3b39 10Gi RWO managed-premium 60d
prometheus-server Bound pvc-75536a49-....-11ea-9bfe-524e272515fe 60Gi RWO managed-premium 8h
$ kubectl -n monitoring get pods
NAME READY STATUS RESTARTS AGE
grafana-676f46565c-tqpzl 1/1 Running 0 39d
grafana-nginx-ingress-controller-5778fc5dcb-7vchz 1/1 Running 0 60d
grafana-nginx-ingress-controller-5778fc5dcb-kkmml 1/1 Running 0 60d
grafana-nginx-ingress-default-backend-7f879557f8-zvkm8 1/1 Running 0 60d
prometheus-alertmanager-788958f7c7-7rgdx 2/2 Running 0 60d
prometheus-kube-state-metrics-55fb55b9db-8gmqt 1/1 Running 0 59d
prometheus-node-exporter-cqlql 1/1 Running 0 60d
prometheus-node-exporter-k4xqf 1/1 Running 0 60d
prometheus-node-exporter-p8cpj 1/1 Running 0 60d
prometheus-pushgateway-699f55c47-8v7jq 1/1 Running 0 60d
prometheus-server-65d76f67cf-jxl4k 2/2 Running 0 9m24s
$ kubectl delete -f alpinebox-recovery-pvc.yml
```
Thanks for reading :)
Comments
Post a Comment