Kubernetes cluster의 Applications tab > Prometheus 설치
설치 전 환경 설정
nfs 서버 설정
공유폴더 생성
$ mkdir /home/data
폴더 권한 변경
$ chmod 777 /home/data
패키지 설치
$ yum install nfs-utils (CentOS)
$ apt-get install nfs-common nfs-kernel-server (Ubuntu)
NFS 설정 수정
$ vi /etc/exports /home/data *(rw,sync,no_subtree_check) ## ip, hostname, 도메인 등으로 설정해야 하나 kubernetes에서 인식 못하는 오류가 있어서 전체 허용으로 설정
설정 반영
exportfs -a systemctl restart nfs-kernel-server
서비스 계정 및 역할 바인딩 배포
rbac.yaml 파일 생성
$ vi rbac.yaml kind: ServiceAccount apiVersion: v1 metadata: name: nfs-client-provisioner --- kind: ClusterRole apiVersion: rbac.authorization.k8s.io/v1 metadata: name: nfs-client-provisioner-runner rules: - apiGroups: [""] resources: ["persistentvolumes"] verbs: ["get", "list", "watch", "create", "delete"] - apiGroups: [""] resources: ["persistentvolumeclaims"] verbs: ["get", "list", "watch", "update"] - apiGroups: ["storage.k8s.io"] resources: ["storageclasses"] verbs: ["get", "list", "watch"] - apiGroups: [""] resources: ["events"] verbs: ["create", "update", "patch"] --- kind: ClusterRoleBinding apiVersion: rbac.authorization.k8s.io/v1 metadata: name: run-nfs-client-provisioner subjects: - kind: ServiceAccount name: nfs-client-provisioner namespace: default roleRef: kind: ClusterRole name: nfs-client-provisioner-runner apiGroup: rbac.authorization.k8s.io --- kind: Role apiVersion: rbac.authorization.k8s.io/v1 metadata: name: leader-locking-nfs-client-provisioner rules: - apiGroups: [""] resources: ["endpoints"] verbs: ["get", "list", "watch", "create", "update", "patch"] --- kind: RoleBinding apiVersion: rbac.authorization.k8s.io/v1 metadata: name: leader-locking-nfs-client-provisioner subjects: - kind: ServiceAccount name: nfs-client-provisioner # replace with namespace where provisioner is deployed namespace: default roleRef: kind: Role name: leader-locking-nfs-client-provisioner apiGroup: rbac.authorization.k8s.io
yaml 배포
$ kubectl create -f rbac.yaml
clusterRole 및 바인딩이 생성되었는지 확인
$ kubectl get clusterrole, clusterrolebinding, role, rolebinding | grep nfs clusterrole.rbac.authorization.k8s.io/nfs-client-provisioner-runner 20m clusterrolebinding.rbac.authorization.k8s.io/run-nfs-client-provisioner 20m role.rbac.authorization.k8s.io/leader-locking-nfs-client-provisioner 20m rolebinding.rbac.authorization.k8s.io/leader- locking-nfs-client-provisioner 20m
스토리지 클래스 및 NFS Provisioner 배포
StorageClass 생성
$ vi class.yaml apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: managed-nfs-storage provisioner: nfs-gitlab reclaimPolicy: Retain allowVolumeExpansion: true parameters: archiveOnDelete: "false"
yaml 배포
$ kubectl create -f class.yaml
StorageClass 생성 확인
$ kubectl get storageclass NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE managed-nfs-storage nfs-gitlab Retain Immediate true 15m
deployment.yaml 파일 생성
$ vi deployment.yaml kind: Deployment apiVersion: apps/v1 metadata: name: nfs-client-provisioner spec: selector: matchLabels: app: nfs-client-provisioner replicas: 1 strategy: type: Recreate template: metadata: labels: app: nfs-client-provisioner spec: serviceAccountName: nfs-client-provisioner containers: - name: nfs-client-provisioner image: quay.io/external_storage/nfs-client-provisioner:latest volumeMounts: - name: nfs-client-root mountPath: /persistentvolumes env: - name: PROVISIONER_NAME value: nfs-gitlab - name: NFS_SERVER value: 14.36.48.220 - name: NFS_PATH value: /home/data volumes: - name: nfs-client-root nfs: server: 14.36.48.220 path: /home/data
yaml 배포
$ kubectl create -f deployment.yaml
nfs-client-provisioner pod 생성 확인
$ kubectl get all NAME READY STATUS RESTARTS AGE pod/nfs-client-provisioner-6d5d96fffb-5v6n7 1/1 Running 0 16m NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/default-http-backend ClusterIP 10.107.231.71 <none> 80/TCP 3h45m service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 30h NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/nfs-client-provisioner 1/1 1 1 16m NAME DESIRED CURRENT READY AGE replicaset.apps/nfs-client-provisioner-6d5d96fffb 1 1 1 16m
values.yaml 수정
$ vi /opt/gitlab/embedded/service/gitlab-rails/vendor/prometheus/values.yaml securityContext: fsGroup: 999 runAsUser: 999 alertmanager: enabled: true persistentVolume: accessModes: - ReadWriteOnce annotations: {} existingClaim: "" mountPath: /home/data size: 20Gi storageClass: managed-nfs-storage subPath: "" kubeStateMetrics: enabled: true nodeExporter: enabled: false pushgateway: enabled: false server: fullnameOverride: "prometheus-prometheus-server" persistentVolume: accessModes: - ReadWriteOnce annotations: {} existingClaim: "" mountPath: /home/data size: 20Gi storageClass: managed-nfs-storage subPath: "" ...
Volume 생성을 위한 권한 부여
# values.yaml 내용 securityContext: fsGroup: 999 runAsUser: 999
alertmanager 설정
# values.yaml 내용 alertmanager: enabled: true persistentVolume: accessModes: - ReadWriteOnce annotations: {} existingClaim: "" mountPath: /home/data size: 20Gi storageClass: managed-nfs-storage subPath: ""
server 설정
# values.yaml 내용 server: fullnameOverride: "prometheus-prometheus-server" persistentVolume: accessModes: - ReadWriteOnce annotations: {} existingClaim: "" mountPath: /home/data size: 20Gi storageClass: managed-nfs-storage subPath: ""
Prometheus 설치 확인
Kubernetes cluster의 Applications tab 확인
Kubernetes cluster의 Health tab 확인
마스터 노드 터미널에서 Pod 로그 확인
$ kubectl logs prometheus-prometheus-server-84b688d6b4-qc7b4 prometheus-server -n gitlab-managed-apps level=info ts=2020-10-13T07:55:23.639Z caller=main.go:330 msg="Starting Prometheus" version="(version=2.15.2, branch=HEAD, revision=d9613e5c466c6e9de548c4dae1b9aabf9aaf7c57)" level=info ts=2020-10-13T07:55:23.639Z caller=main.go:331 build_context="(go=go1.13.5, user=root@688433cf4ff7, date=20200106-14:50:51)" level=info ts=2020-10-13T07:55:23.639Z caller=main.go:332 host_details="(Linux 5.4.0-48-generic #52~18.04.1-Ubuntu SMP Thu Sep 10 12:50:22 UTC 2020 x86_64 prometheus-prometheus-server-84b688d6b4-qc7b4 (none))" level=info ts=2020-10-13T07:55:23.639Z caller=main.go:333 fd_limits="(soft=1048576, hard=1048576)" level=info ts=2020-10-13T07:55:23.639Z caller=main.go:334 vm_limits="(soft=unlimited, hard=unlimited)" level=info ts=2020-10-13T07:55:23.694Z caller=web.go:506 component=web msg="Start listening for connections" address=0.0.0.0:9090 level=info ts=2020-10-13T07:55:23.693Z caller=main.go:648 msg="Starting TSDB ..." level=info ts=2020-10-13T07:55:23.789Z caller=head.go:584 component=tsdb msg="replaying WAL, this may take awhile" level=info ts=2020-10-13T07:55:23.791Z caller=head.go:632 component=tsdb msg="WAL segment loaded" segment=0 maxSegment=0 level=info ts=2020-10-13T07:55:23.796Z caller=main.go:663 fs_type=NFS_SUPER_MAGIC level=info ts=2020-10-13T07:55:23.796Z caller=main.go:664 msg="TSDB started" level=info ts=2020-10-13T07:55:23.796Z caller=main.go:734 msg="Loading configuration file" filename=/etc/config/prometheus.yml level=info ts=2020-10-13T07:55:23.805Z caller=kubernetes.go:190 component="discovery manager scrape" discovery=k8s msg="Using pod service account via in-cluster config" level=info ts=2020-10-13T07:55:23.808Z caller=kubernetes.go:190 component="discovery manager scrape" discovery=k8s msg="Using pod service account via in-cluster config" level=info ts=2020-10-13T07:55:23.810Z caller=kubernetes.go:190 component="discovery manager scrape" discovery=k8s msg="Using pod service account via in-cluster config" level=info ts=2020-10-13T07:55:23.813Z caller=kubernetes.go:190 component="discovery manager notify" discovery=k8s msg="Using pod service account via in-cluster config" level=info ts=2020-10-13T07:55:23.816Z caller=main.go:762 msg="Completed loading of configuration file" filename=/etc/config/prometheus.yml level=info ts=2020-10-13T07:55:23.816Z caller=main.go:617 msg="Server is ready to receive web requests." $ kubectl logs prometheus-alertmanager-58b88c665b-b7sqt prometheus-alertmanager -n gitlab-managed-apps level=info ts=2020-10-13T07:55:24.377Z caller=main.go:231 msg="Starting Alertmanager" version="(version=0.20.0, branch=HEAD, revision=f74be0400a6243d10bb53812d6fa408ad71ff32d)" level=info ts=2020-10-13T07:55:24.377Z caller=main.go:232 build_context="(go=go1.13.5, user=root@00c3106655f8, date=20191211-14:13:14)" level=info ts=2020-10-13T07:55:24.516Z caller=cluster.go:623 component=cluster msg="Waiting for gossip to settle..." interval=2s level=info ts=2020-10-13T07:55:24.545Z caller=coordinator.go:119 component=configuration msg="Loading configuration file" file=/etc/config/alertmanager.yml level=info ts=2020-10-13T07:55:24.545Z caller=coordinator.go:131 component=configuration msg="Completed loading of configuration file" file=/etc/config/alertmanager.yml level=info ts=2020-10-13T07:55:24.548Z caller=main.go:497 msg=Listening address=:9093 level=info ts=2020-10-13T07:55:26.516Z caller=cluster.go:648 component=cluster msg="gossip not settled" polls=0 before=0 now=1 elapsed=2.00012541s level=info ts=2020-10-13T07:55:34.517Z caller=cluster.go:640 component=cluster msg="gossip settled; proceeding" elapsed=10.000979807s
Add Comment