00 Prometheus Operater 监控体系原理梳理文档
k8s集群:1.18.6,rke部署
监控:prometheus-operator-9.3.1
# kubectl get crd |grep coreos
alertmanagers.monitoring.coreos.com 2020-08-31T08:53:59Z
podmonitors.monitoring.coreos.com 2020-09-08T08:46:07Z
prometheuses.monitoring.coreos.com 2020-08-31T08:53:58Z
prometheusrules.monitoring.coreos.com 2020-08-31T08:53:58Z
servicemonitors.monitoring.coreos.com 2020-08-31T08:53:59Z
thanosrulers.monitoring.coreos.com 2020-09-08T08:46:10Z
- prometheuses:根据此对象,可以自动部署一个Prometheus应用的statefulset
- alertmanagers:根据此对象,可以自动部署一个Prometheus应用的statefulset
- servicemonitors:根据此对象,operater会自动生成抓取目标service的配置
- podmonitors:根据此对象,operater会自动生成抓取目标pod的配置
- prometheusrules:根据此对象,operater会自动生成rule配置

#kubectl get Prometheus -o yaml
apiVersion: v1
items:
- apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
annotations:
meta.helm.sh/release-name: prometheus-operator
meta.helm.sh/release-namespace: monitoring
project.cattle.io/namespaces: '["cattle-system","kube-node-lease","kube-public","kube-system","monitoring"]'
labels:
app: prometheus-operator-prometheus
app.kubernetes.io/managed-by: Helm
chart: prometheus-operator-9.3.1
heritage: Helm
release: prometheus-operator
name: prometheus-operator-prometheus
namespace: monitoring
spec:
alerting:
# AlertmanagerEndpoints配置
# 指明Prometheus向哪个alertmanager发出告警信息
alertmanagers:
- apiVersion: v2
name: prometheus-operator-alertmanager #Endpoints对象的名字
namespace: monitoring
pathPrefix: /
port: web
arbitraryFSAccessThroughSMs: {}
baseImage: quay.io/prometheus/prometheus
externalUrl: http://prometheus.rencaiyoujia.cn/
logFormat: logfmt
logLevel: info
# podMonitor选择器
podMonitorNamespaceSelector: {} # 如果未nil,只检查Prometheus这个crd所在的ns,例如本例中的monitoring
podMonitorSelector:
matchLabels:
release: prometheus-operator
portName: web
replicas: 1
resources:
limits:
cpu: "1"
memory: 1000Mi
requests:
cpu: 750m
memory: 750Mi
retention: 10d
# prometheus路由前缀
routePrefix: /
# rule选择器:从PrometheusRules资源中加载告警规则等
ruleNamespaceSelector: {}
ruleSelector:
matchLabels:
app: prometheus-operator
release: prometheus-operator
rules:
alert: {} #指定/--rules.alert.*/命令行参数
securityContext:
fsGroup: 2000
runAsGroup: 2000
runAsNonRoot: true
runAsUser: 1000
##运行Prometheus这个Pod的ServiceAccount,用于RBAC权限分配。该sa通过ClusterRoleBinding(prometheus-cluster-monitoring-cattle-prometheus)绑定了ClusterRole(prometheus-cluster-monitoring-cattle-prometheus)集群角色
serviceAccountName: prometheus-operator-prometheus
# serviceMonitor选择器:monitoring名称空间下具有release: prometheus-operator标签的service可以被check
serviceMonitorNamespaceSelector: {}
serviceMonitorSelector:
matchLabels:
release: prometheus-operator
storage:
volumeClaimTemplate:
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 50Gi
storageClassName: alicloud-nas-subpath
#Prometheus的版本信息
version: v2.18.2
kind: List
metadata:
resourceVersion: ""
selfLink: ""
从上面可以看出,Prometheus资源把alertmanagers、servicemonitors、podmonitors、prometheusrules等资源给联系在一起了。
以上Prometheus对象会被Operater生成Pormetheus的配置文件,如下:
global:
scrape_interval: 30s
scrape_timeout: 10s
evaluation_interval: 30s
external_labels:
prometheus: monitoring/prometheus-operator-prometheus
prometheus_replica: prometheus-prometheus-operator-prometheus-0
alerting:
alert_relabel_configs:
- separator: ;
regex: prometheus_replica
replacement: $1
action: labeldrop
alertmanagers:
- kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- monitoring
scheme: http
path_prefix: /
timeout: 10s
api_version: v2
relabel_configs:
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: prometheus-operator-alertmanager
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_port_name]
separator: ;
regex: web
replacement: $1
action: keep
rule_files:
- /etc/prometheus/rules/prometheus-prometheus-operator-prometheus-rulefiles-0/*.yaml
scrape_configs:
- job_name: monitoring/prometheus-operator-alertmanager/0
honor_timestamps: true
scrape_interval: 30s
scrape_timeout: 10s
metrics_path: /metrics
scheme: http
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- monitoring
relabel_configs:
- source_labels: [__meta_kubernetes_service_label_app]
separator: ;
regex: prometheus-operator-alertmanager
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_service_label_release]
separator: ;
regex: prometheus-operator
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_service_label_self_monitor]
separator: ;
regex: "true"
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_port_name]
separator: ;
regex: web
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Node;(.*)
target_label: node
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Pod;(.*)
target_label: pod
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_namespace]
separator: ;
regex: (.*)
target_label: namespace
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: service
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_name]
separator: ;
regex: (.*)
target_label: pod
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: job
replacement: ${1}
action: replace
- separator: ;
regex: (.*)
target_label: endpoint
replacement: web
action: replace
- job_name: monitoring/prometheus-operator-apiserver/0
honor_timestamps: true
scrape_interval: 30s
scrape_timeout: 10s
metrics_path: /metrics
scheme: https
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- default
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
server_name: kubernetes
insecure_skip_verify: false
relabel_configs:
- source_labels: [__meta_kubernetes_service_label_component]
separator: ;
regex: apiserver
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_service_label_provider]
separator: ;
regex: kubernetes
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_port_name]
separator: ;
regex: https
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Node;(.*)
target_label: node
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Pod;(.*)
target_label: pod
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_namespace]
separator: ;
regex: (.*)
target_label: namespace
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: service
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_name]
separator: ;
regex: (.*)
target_label: pod
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: job
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_service_label_component]
separator: ;
regex: (.+)
target_label: job
replacement: ${1}
action: replace
- separator: ;
regex: (.*)
target_label: endpoint
replacement: https
action: replace
- job_name: monitoring/prometheus-operator-coredns/0
honor_timestamps: true
scrape_interval: 30s
scrape_timeout: 10s
metrics_path: /metrics
scheme: http
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- kube-system
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- source_labels: [__meta_kubernetes_service_label_app]
separator: ;
regex: prometheus-operator-coredns
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_service_label_release]
separator: ;
regex: prometheus-operator
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_port_name]
separator: ;
regex: http-metrics
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Node;(.*)
target_label: node
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Pod;(.*)
target_label: pod
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_namespace]
separator: ;
regex: (.*)
target_label: namespace
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: service
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_name]
separator: ;
regex: (.*)
target_label: pod
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: job
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_service_label_jobLabel]
separator: ;
regex: (.+)
target_label: job
replacement: ${1}
action: replace
- separator: ;
regex: (.*)
target_label: endpoint
replacement: http-metrics
action: replace
- job_name: monitoring/prometheus-operator-grafana/0
honor_timestamps: true
scrape_interval: 30s
scrape_timeout: 10s
metrics_path: /metrics
scheme: http
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- monitoring
relabel_configs:
- source_labels: [__meta_kubernetes_service_label_app_kubernetes_io_instance]
separator: ;
regex: prometheus-operator
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_service_label_app_kubernetes_io_name]
separator: ;
regex: grafana
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_port_name]
separator: ;
regex: service
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Node;(.*)
target_label: node
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Pod;(.*)
target_label: pod
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_namespace]
separator: ;
regex: (.*)
target_label: namespace
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: service
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_name]
separator: ;
regex: (.*)
target_label: pod
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: job
replacement: ${1}
action: replace
- separator: ;
regex: (.*)
target_label: endpoint
replacement: service
action: replace
- job_name: monitoring/prometheus-operator-kube-controller-manager/0
honor_timestamps: true
scrape_interval: 30s
scrape_timeout: 10s
metrics_path: /metrics
scheme: http
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- kube-system
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- source_labels: [__meta_kubernetes_service_label_app]
separator: ;
regex: prometheus-operator-kube-controller-manager
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_service_label_release]
separator: ;
regex: prometheus-operator
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_port_name]
separator: ;
regex: http-metrics
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Node;(.*)
target_label: node
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Pod;(.*)
target_label: pod
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_namespace]
separator: ;
regex: (.*)
target_label: namespace
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: service
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_name]
separator: ;
regex: (.*)
target_label: pod
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: job
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_service_label_jobLabel]
separator: ;
regex: (.+)
target_label: job
replacement: ${1}
action: replace
- separator: ;
regex: (.*)
target_label: endpoint
replacement: http-metrics
action: replace
- job_name: monitoring/prometheus-operator-kube-etcd/0
honor_timestamps: true
scrape_interval: 30s
scrape_timeout: 10s
metrics_path: /metrics
scheme: http
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- kube-system
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- source_labels: [__meta_kubernetes_service_label_app]
separator: ;
regex: prometheus-operator-kube-etcd
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_service_label_release]
separator: ;
regex: prometheus-operator
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_port_name]
separator: ;
regex: http-metrics
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Node;(.*)
target_label: node
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Pod;(.*)
target_label: pod
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_namespace]
separator: ;
regex: (.*)
target_label: namespace
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: service
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_name]
separator: ;
regex: (.*)
target_label: pod
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: job
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_service_label_jobLabel]
separator: ;
regex: (.+)
target_label: job
replacement: ${1}
action: replace
- separator: ;
regex: (.*)
target_label: endpoint
replacement: http-metrics
action: replace
- job_name: monitoring/prometheus-operator-kube-proxy/0
honor_timestamps: true
scrape_interval: 30s
scrape_timeout: 10s
metrics_path: /metrics
scheme: http
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- kube-system
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- source_labels: [__meta_kubernetes_service_label_app]
separator: ;
regex: prometheus-operator-kube-proxy
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_service_label_release]
separator: ;
regex: prometheus-operator
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_port_name]
separator: ;
regex: http-metrics
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Node;(.*)
target_label: node
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Pod;(.*)
target_label: pod
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_namespace]
separator: ;
regex: (.*)
target_label: namespace
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: service
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_name]
separator: ;
regex: (.*)
target_label: pod
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: job
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_service_label_jobLabel]
separator: ;
regex: (.+)
target_label: job
replacement: ${1}
action: replace
- separator: ;
regex: (.*)
target_label: endpoint
replacement: http-metrics
action: replace
- job_name: monitoring/prometheus-operator-kube-scheduler/0
honor_timestamps: true
scrape_interval: 30s
scrape_timeout: 10s
metrics_path: /metrics
scheme: http
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- kube-system
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- source_labels: [__meta_kubernetes_service_label_app]
separator: ;
regex: prometheus-operator-kube-scheduler
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_service_label_release]
separator: ;
regex: prometheus-operator
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_port_name]
separator: ;
regex: http-metrics
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Node;(.*)
target_label: node
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Pod;(.*)
target_label: pod
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_namespace]
separator: ;
regex: (.*)
target_label: namespace
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: service
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_name]
separator: ;
regex: (.*)
target_label: pod
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: job
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_service_label_jobLabel]
separator: ;
regex: (.+)
target_label: job
replacement: ${1}
action: replace
- separator: ;
regex: (.*)
target_label: endpoint
replacement: http-metrics
action: replace
- job_name: monitoring/prometheus-operator-kube-state-metrics/0
honor_labels: true
honor_timestamps: true
scrape_interval: 30s
scrape_timeout: 10s
metrics_path: /metrics
scheme: http
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- monitoring
relabel_configs:
- source_labels: [__meta_kubernetes_service_label_app_kubernetes_io_instance]
separator: ;
regex: prometheus-operator
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_service_label_app_kubernetes_io_name]
separator: ;
regex: kube-state-metrics
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_port_name]
separator: ;
regex: http
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Node;(.*)
target_label: node
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Pod;(.*)
target_label: pod
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_namespace]
separator: ;
regex: (.*)
target_label: namespace
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: service
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_name]
separator: ;
regex: (.*)
target_label: pod
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: job
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_service_label_app_kubernetes_io_name]
separator: ;
regex: (.+)
target_label: job
replacement: ${1}
action: replace
- separator: ;
regex: (.*)
target_label: endpoint
replacement: http
action: replace
- job_name: monitoring/prometheus-operator-kubelet/0
honor_labels: true
honor_timestamps: true
scrape_interval: 30s
scrape_timeout: 10s
metrics_path: /metrics
scheme: https
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- kube-system
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
relabel_configs:
- source_labels: [__meta_kubernetes_service_label_k8s_app]
separator: ;
regex: kubelet
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_port_name]
separator: ;
regex: https-metrics
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Node;(.*)
target_label: node
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Pod;(.*)
target_label: pod
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_namespace]
separator: ;
regex: (.*)
target_label: namespace
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: service
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_name]
separator: ;
regex: (.*)
target_label: pod
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: job
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_service_label_k8s_app]
separator: ;
regex: (.+)
target_label: job
replacement: ${1}
action: replace
- separator: ;
regex: (.*)
target_label: endpoint
replacement: https-metrics
action: replace
- source_labels: [__metrics_path__]
separator: ;
regex: (.*)
target_label: metrics_path
replacement: $1
action: replace
- job_name: monitoring/prometheus-operator-kubelet/1
honor_labels: true
honor_timestamps: true
scrape_interval: 30s
scrape_timeout: 10s
metrics_path: /metrics/cadvisor
scheme: https
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- kube-system
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
relabel_configs:
- source_labels: [__meta_kubernetes_service_label_k8s_app]
separator: ;
regex: kubelet
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_port_name]
separator: ;
regex: https-metrics
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Node;(.*)
target_label: node
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Pod;(.*)
target_label: pod
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_namespace]
separator: ;
regex: (.*)
target_label: namespace
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: service
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_name]
separator: ;
regex: (.*)
target_label: pod
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: job
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_service_label_k8s_app]
separator: ;
regex: (.+)
target_label: job
replacement: ${1}
action: replace
- separator: ;
regex: (.*)
target_label: endpoint
replacement: https-metrics
action: replace
- source_labels: [__metrics_path__]
separator: ;
regex: (.*)
target_label: metrics_path
replacement: $1
action: replace
- job_name: monitoring/prometheus-operator-kubelet/2
honor_labels: true
honor_timestamps: true
scrape_interval: 30s
scrape_timeout: 10s
metrics_path: /metrics/probes
scheme: https
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- kube-system
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
relabel_configs:
- source_labels: [__meta_kubernetes_service_label_k8s_app]
separator: ;
regex: kubelet
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_port_name]
separator: ;
regex: https-metrics
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Node;(.*)
target_label: node
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Pod;(.*)
target_label: pod
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_namespace]
separator: ;
regex: (.*)
target_label: namespace
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: service
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_name]
separator: ;
regex: (.*)
target_label: pod
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: job
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_service_label_k8s_app]
separator: ;
regex: (.+)
target_label: job
replacement: ${1}
action: replace
- separator: ;
regex: (.*)
target_label: endpoint
replacement: https-metrics
action: replace
- source_labels: [__metrics_path__]
separator: ;
regex: (.*)
target_label: metrics_path
replacement: $1
action: replace
- job_name: monitoring/prometheus-operator-kubelet/3
honor_labels: true
honor_timestamps: true
scrape_interval: 30s
scrape_timeout: 10s
metrics_path: /metrics/resource/v1alpha1
scheme: https
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- kube-system
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
relabel_configs:
- source_labels: [__meta_kubernetes_service_label_k8s_app]
separator: ;
regex: kubelet
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_port_name]
separator: ;
regex: https-metrics
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Node;(.*)
target_label: node
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Pod;(.*)
target_label: pod
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_namespace]
separator: ;
regex: (.*)
target_label: namespace
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: service
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_name]
separator: ;
regex: (.*)
target_label: pod
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: job
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_service_label_k8s_app]
separator: ;
regex: (.+)
target_label: job
replacement: ${1}
action: replace
- separator: ;
regex: (.*)
target_label: endpoint
replacement: https-metrics
action: replace
- source_labels: [__metrics_path__]
separator: ;
regex: (.*)
target_label: metrics_path
replacement: $1
action: replace
- job_name: monitoring/prometheus-operator-node-exporter/0
honor_timestamps: true
scrape_interval: 30s
scrape_timeout: 10s
metrics_path: /metrics
scheme: http
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- monitoring
relabel_configs:
- source_labels: [__meta_kubernetes_service_label_app]
separator: ;
regex: prometheus-node-exporter
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_service_label_release]
separator: ;
regex: prometheus-operator
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_port_name]
separator: ;
regex: metrics
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Node;(.*)
target_label: node
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Pod;(.*)
target_label: pod
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_namespace]
separator: ;
regex: (.*)
target_label: namespace
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: service
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_name]
separator: ;
regex: (.*)
target_label: pod
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: job
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_service_label_jobLabel]
separator: ;
regex: (.+)
target_label: job
replacement: ${1}
action: replace
- separator: ;
regex: (.*)
target_label: endpoint
replacement: metrics
action: replace
- job_name: monitoring/prometheus-operator-operator/0
honor_labels: true
honor_timestamps: true
scrape_interval: 30s
scrape_timeout: 10s
metrics_path: /metrics
scheme: http
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- monitoring
relabel_configs:
- source_labels: [__meta_kubernetes_service_label_app]
separator: ;
regex: prometheus-operator-operator
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_service_label_release]
separator: ;
regex: prometheus-operator
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_port_name]
separator: ;
regex: http
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Node;(.*)
target_label: node
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Pod;(.*)
target_label: pod
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_namespace]
separator: ;
regex: (.*)
target_label: namespace
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: service
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_name]
separator: ;
regex: (.*)
target_label: pod
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: job
replacement: ${1}
action: replace
- separator: ;
regex: (.*)
target_label: endpoint
replacement: http
action: replace
- job_name: monitoring/prometheus-operator-prometheus/0
honor_timestamps: true
scrape_interval: 30s
scrape_timeout: 10s
metrics_path: /metrics
scheme: http
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- monitoring
relabel_configs:
- source_labels: [__meta_kubernetes_service_label_app]
separator: ;
regex: prometheus-operator-prometheus
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_service_label_release]
separator: ;
regex: prometheus-operator
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_service_label_self_monitor]
separator: ;
regex: "true"
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_port_name]
separator: ;
regex: web
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Node;(.*)
target_label: node
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Pod;(.*)
target_label: pod
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_namespace]
separator: ;
regex: (.*)
target_label: namespace
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: service
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_name]
separator: ;
regex: (.*)
target_label: pod
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: job
replacement: ${1}
action: replace
- separator: ;
regex: (.*)
target_label: endpoint
replacement: web
action: replace
rancher中Prometheus对象的yaml文件,仅供参考:
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
annotations:
labels:
app: prometheus
chart: prometheus-0.0.1
heritage: Tiller
io.cattle.field/appId: cluster-monitoring
release: cluster-monitoring
name: cluster-monitoring
namespace: cattle-prometheus
spec:
# 指定了Prometheus的中AlertManager的额外配置。指定配置将附加到Prometheus Operator生成的配置中。
additionalAlertManagerConfigs:
key: additional-alertmanager-configs.yaml #secret中的key
name: prometheus-cluster-monitoring-additional-alertmanager-configs #值为secret-name
# 指定了Prometheus的额外的抓取配置的Secret。指定的抓取配置将附加到Prometheus Operator生成的配置中。
additionalScrapeConfigs:
key: additional-scrape-configs.yaml
name: prometheus-cluster-monitoring-additional-scrape-configs #secret,其中定义了一系列的job_name,实现自定义scrape配置文件
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- podAffinityTerm:
labelSelector:
matchLabels:
app: prometheus
prometheus: cluster-monitoring
topologyKey: kubernetes.io/hostname
weight: 100
arbitraryFSAccessThroughSMs: {}
baseImage: rancher/prom-prometheus
configMaps: # 在Prometheus Pod中生成/etc/prometheus/configmaps/<configmap-name>
- prometheus-cluster-monitoring-nginx
containers:
- command:
- /bin/sh
- -c
- cp /nginx/run-sh.tmpl /var/cache/nginx/nginx-start.sh; chmod +x /var/cache/nginx/nginx-start.sh;
/var/cache/nginx/nginx-start.sh
env:
- name: POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
image: rancher/nginx:1.17.4-alpine
name: prometheus-proxy
ports:
- containerPort: 8080
name: nginx-http
protocol: TCP
resources:
limits:
cpu: 100m
memory: 100Mi
requests:
cpu: 50m
memory: 50Mi
securityContext:
runAsGroup: 101
runAsUser: 101
volumeMounts:
- mountPath: /nginx
name: configmap-prometheus-cluster-monitoring-nginx
- mountPath: /var/cache/nginx
name: nginx-home
- args:
- --proxy-url=http://127.0.0.1:9090
- --listen-address=$(POD_IP):9090
- --filter-reader-labels=prometheus
- --filter-reader-labels=prometheus_replica
command:
- prometheus-auth
env:
- name: POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
image: rancher/prometheus-auth:v0.2.0
livenessProbe:
failureThreshold: 6
httpGet:
path: /-/healthy
port: web
scheme: HTTP
initialDelaySeconds: 300
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 10
name: prometheus-agent
ports:
- containerPort: 9090
name: web
protocol: TCP
readinessProbe:
failureThreshold: 10
httpGet:
path: /-/ready
port: web
scheme: HTTP
initialDelaySeconds: 60
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 10
resources:
limits:
cpu: 500m
memory: 200Mi
requests:
cpu: 100m
memory: 100Mi
evaluationInterval: 60s
externalLabels:
prometheus_from: rke
listenLocal: true
logFormat: logfmt
logLevel: info
nodeSelector:
kubernetes.io/os: linux
podMetadata:
labels:
app: prometheus
chart: prometheus-0.0.1
release: cluster-monitoring
replicas: 1
resources:
limits:
cpu: "1"
memory: 1000Mi
requests:
cpu: 750m
memory: 750Mi
retention: 12h
# rule选择器
ruleNamespaceSelector:
matchExpressions:
- key: field.cattle.io/projectId #“field.cattle.io/projectId”是ns的label
operator: In
values:
- p-vrc2z
- key: field.cattle.io/projectId
operator: In
values:
- p-vrc2z
ruleSelector:
matchExpressions:
- key: source
operator: In
values:
- rancher-alert
- rancher-monitoring
rules:
alert: {}
scrapeInterval: 60s
securityContext:
fsGroup: 2000
runAsNonRoot: true
runAsUser: 1000
serviceAccountName: cluster-monitoring
# serviceMonitor选择器
serviceMonitorNamespaceSelector:
matchExpressions:
- key: field.cattle.io/projectId
operator: In
values:
- p-vrc2z
- key: field.cattle.io/projectId
operator: In
values:
- p-vrc2z
serviceMonitorSelector: {}
tolerations:
- effect: NoSchedule
key: cattle.io/os
operator: Equal
value: linux
version: v2.17.2
volumes:
- emptyDir: {}
name: nginx-home
Prometheus的配置文件主要有4部分:
global:n. #第1部分:全局配置
scrape_interval: 1m
scrape_timeout: 10s
evaluation_interval: 1m
external_labels:
prometheus: cattle-prometheus/cluster-monitoring
prometheus_from: k8s-prodenv
prometheus_replica: prometheus-cluster-monitoring-0
alerting: #第2部分:alertmanager配置
alert_relabel_configs:
- separator: ;
regex: prometheus_replica
replacement: $1
action: labeldrop
alertmanagers:
- static_configs:
- targets:
- alertmanager-operated.cattle-prometheus:9093
labels:
cluster_id: c-qtpxd
cluster_name: k8s-prodenv
level: cluster
scheme: http
timeout: 10s
api_version: v1
rule_files: #第3部分:规则配置
- /etc/prometheus/rules/prometheus-cluster-monitoring-rulefiles-0/*.yaml
scrape_configs: #第4部分:抓取配置
- job_name: cattle-prometheus/alertmanager-cluster-alerting/0
...省略...
从Prometheus对象的yaml配置文件中,我们发现可以通过如下方式自定义Prometheus的配置:
- additionalScrapeConfigs中指定secret,其内容直接追加到Prometheus的配置文件中的scrape_configs部分
- 定义serviceMonitor对象,由Prometheus-Operater转化为配置文件的scrape_configs部分
- additionalAlertManagerConfigs中指定secret,其内容直接追加到Prometheus的配置文件中的alerting部分
- 定义PrometheusRule对象,由Prometheus-Operater转化为配置文件的rule_files部分
# kubectl get secret prometheus-cluster-monitoring-additional-scrape-configs -o yaml
apiVersion: v1
data:
additional-scrape-configs.yaml: ***base64格式的密文***
kind: Secret
metadata:
creationTimestamp: "2020-09-08T08:44:51Z"
labels:
app: prometheus
chart: prometheus-0.0.1
heritage: Tiller
io.cattle.field/appId: cluster-monitoring
release: cluster-monitoring
name: prometheus-cluster-monitoring-additional-scrape-configs
namespace: cattle-prometheus
type: Opaque
#解析密文得到:
- job_name: 'istio/envoy-stats'
scrape_interval: 15s
metrics_path: /stats/prometheus
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_container_port_name]
action: keep
regex: '.*-envoy-prom'
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:15090
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: pod_name
- job_name: 'istio/istio-mesh'
...省略其他...

1、修改additional-scrape-configs.yaml文件内容:
- job_name: 'istio/envoy-stats'
scrape_interval: 15s
metrics_path: /stats/prometheus
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_container_port_name]
action: keep
regex: '.*-envoy-prom'
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:15090
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: pod_name
- job_name: 'istio/istio-mesh'
scrape_interval: 15s
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- istio-system
relabel_configs:
- source_labels: [__meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: istio-telemetry;prometheus
metric_relabel_configs:
- source_labels: [source_workload_namespace]
action: replace
target_label: namespace
- job_name: 'istio/istio-policy'
scrape_interval: 15s
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- istio-system
relabel_configs:
- source_labels: [__meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: istio-policy;http-monitoring
- job_name: 'istio/istio-telemetry'
scrape_interval: 15s
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- istio-system
relabel_configs:
- source_labels: [__meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: istio-telemetry;http-monitoring
- job_name: 'istio/pilot'
scrape_interval: 15s
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- istio-system
relabel_configs:
- source_labels: [__meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: istio-pilot;http-monitoring
- job_name: 'istio/galley'
scrape_interval: 15s
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- istio-system
relabel_configs:
- source_labels: [__meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: istio-galley;http-monitoring
- job_name: 'istio/citadel'
scrape_interval: 15s
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- istio-system
relabel_configs:
- source_labels: [__meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: istio-citadel;http-monitoring
- job_name: 'prometheus-io-scrape'
kubernetes_sd_configs:
- role: pod
namespaces:
names:
- ingress-nginx
- ingress-controller
- kube-system
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__
regex: (https?)
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
- source_labels: [__meta_kubernetes_pod_node_name]
action: replace
target_label: node
regex: (.+)
replacement: $1
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: namespace
regex: (.+)
replacement: $1
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: pod
regex: (.+)
replacement: $1
- source_labels: [__meta_kubernetes_pod_ip]
action: replace
target_label: pod_ip
regex: (.+)
replacement: $1
- source_labels: [__meta_kubernetes_pod_host_ip]
action: replace
target_label: host_ip
regex: (.+)
replacement: $1
- source_labels: [__meta_kubernetes_pod_controller_kind]
action: replace
target_label: created_by_kind
regex: (.+)
replacement: $1
- source_labels: [__meta_kubernetes_pod_controller_name]
action: replace
target_label: created_by_kind
regex: (.+)
replacement: $1
- job_name: 'kubernetes-http-services'
metrics_path: /probe
params:
module: [http_2xx] # 使用定义的http模块
kubernetes_sd_configs:
- role: service # service 类型的服务发现
relabel_configs:
# 只有service的annotation中配置了 prometheus.io/http_probe=true 的才进行发现
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_http_probe]
action: keep
regex: true
- source_labels: [__address__]
target_label: __param_target
- target_label: __address__
replacement: blackbox:9115 #blackbox-exporter访问地址
- source_labels: [__param_target]
target_label: instance
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
target_label: kubernetes_name
- job_name: 'kubernetes-ingresses'
metrics_path: /probe
params:
module: [http_2xx] # 使用定义的http模块
kubernetes_sd_configs:
- role: ingress # ingress 类型的服务发现
relabel_configs:
# 只有ingress的annotation中配置了 prometheus.io/http_probe=true的才进行发现
- source_labels: [__meta_kubernetes_ingress_annotation_prometheus_io_http_probe]
action: keep
regex: true
- source_labels: [__meta_kubernetes_ingress_scheme,__address__,__meta_kubernetes_ingress_path]
regex: (.+);(.+);(.+)
replacement: ${1}://${2}${3}
target_label: __param_target
- target_label: __address__
replacement: blackbox:9115
- source_labels: [__param_target]
target_label: instance
- action: labelmap
regex: __meta_kubernetes_ingress_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_ingress_name]
target_label: kubernetes_name
- job_name: 'blackbox_http_2xx'
scrape_interval: 45s
metrics_path: /probe
params:
module: [http_2xx] # Look for a HTTP 200 response.
static_configs:
- targets:
- https://oppc.rencaiyoujia.com/ping
- https://api.rencaiyoujia.com/ping
- https://www.rencaiyoujia.com
- https://m.rencaiyoujia.com
- https://www.m.rencaiyoujia.com
- https://fa.rencaiyoujia.com
- https://rencaiyoujia.com
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: blackbox:9115
2、重新创建secret
#删除原来的secret
kubectl delete secret prometheus-cluster-monitoring-additional-scrape-configs
#重新创建secret
kubectl create secret generic prometheus-cluster-monitoring-additional-scrape-configs --from-file=additional-scrape-configs.yaml
文档:https://cloud.tencent.com/document/product/248/53288
Prometheus对象可以关联Alertmanager、PrometheusRule、ServiceMonitor。而ServiceMonitor可以被operater生成为Prometheus的配置文件,从而关联到相应的Service(Endpoints)。

action: keep的作用

后面详解!!!
后面详解!!!
如果在我们的 Kubernetes 集群中有了很多的 Service/Pod,那么我们都需要一个一个的去建立一个对应的 ServiceMonitor 对象来进行监控吗?这样岂不是又变得麻烦起来了?
为解决上面的问题,我们可以使用Prometheus的自动发现的抓取配置的来解决这个问题,想要在 Prometheus当中去自动发现并监控具有prometheus.io/scrape=true这个 annotations 的 Service,我们定义的 Prometheus 的抓取配置如下:
- job_name: 'kubernetes-service-endpoints'
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__
regex: (https?)
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: kubernetes_name
cat >> additionalScrapeConfigs-values.yaml <<"EOF"
prometheus:
prometheusSpec:
additionalScrapeConfigs:
#让Prometheus Operator去自动发现并监控具有prometheus.io/scrape=true这个annotations的 Service
- job_name: 'kubernetes-service-endpoints'
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__
regex: (https?)
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: kubernetes_name
EOF
升级Prometheus-operater
helm upgrade prometheus-operator stable/prometheus-operator -f additionalScrapeConfigs-values.yaml -n monitoring
除了配置prometheus.prometheusSpec.additionalScrapeConfigs,也可以将上面的配置配置成k8s的secret,使用prometheus.prometheusSpec.additionalScrapeConfigsSecret配置。
以上是书写的Prometheus原生的配置文件,我们也可以将其转换为ServiceMonitor对象。
apiVersion: apps/v1
kind: Deployment
metadata:
name: mysqld-exporter
namespace: cattle-prometheus
spec:
replicas: 1
selector:
matchLabels:
app: mysqld-exporter
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
type: RollingUpdate
template:
metadata:
labels:
app: mysqld-exporter
spec:
containers:
- name: mysqld-exporter
image: prom/mysqld-exporter
imagePullPolicy: IfNotPresent
env:
- name: DATA_SOURCE_NAME
value: exporter:rcyj1qaz2wsx@(10.15.9.210:3306)/
# CREATE USER 'exporter'@'172.22.22.%' IDENTIFIED BY 'rcyj1qaz2wsx' WITH MAX_USER_CONNECTIONS 3;
# GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'exporter'@'172.22.22.%';
# CREATE USER 'exporter'@'10.15.9.%' IDENTIFIED BY 'rcyj1qaz2wsx' WITH MAX_USER_CONNECTIONS 3;
# GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'exporter'@'10.15.9.%';
ports:
- containerPort: 9104
protocol: TCP
name: metrics
resources:
limits:
cpu: 500m
memory: 500Mi
requests:
cpu: 100m
memory: 128Mi
# livenessProbe:
# failureThreshold: 3
# httpGet:
# path: /rcyj-platformB
# port: 8080
# scheme: HTTP
# initialDelaySeconds: 35
# periodSeconds: 5
# successThreshold: 1
# timeoutSeconds: 2
# readinessProbe:
# failureThreshold: 3
# httpGet:
# path: /rcyj-platformB
# port: 8080
# scheme: HTTP
# initialDelaySeconds: 35
# periodSeconds: 5
# successThreshold: 1
# timeoutSeconds: 2
---
apiVersion: v1
kind: Service
metadata:
annotations:
prometheus.io/scrape: "true" #带有这个注解的svc会自动被抓取
labels:
app: mysqld-exporter
name: mysqld-exporter
namespace: cattle-prometheus
spec:
ports:
- port: 9104
protocol: TCP
targetPort: 9104
name: metrics
selector:
app: mysqld-exporter
sessionAffinity: None
type: ClusterIP
如果现有服务的port-name不符合服务发现的预配置。可以通过设定注解来改变port-name的值。
__address__:当前Target实例的访问地址[host]:[port]__scheme__:采集目标服务访问地址的HTTP Scheme,HTTP或者HTTPS__metrics_path__:采集目标服务访问地址的访问路径
例如:
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "9121"
参考文档:
Prometheus Operator 自动发现以及数据持久化
自定义告警规则可以通过PrometheusRule对象实现。
PrometheusRule对象与Prometheus配置文件的关系:
创建一个 PrometheusRule 资源对象后,operater会将该对象会被转换成
configmap/prometheus-prometheus-operator-prometheus-rulefiles-0。然后自动在prometheus这个Pod的/etc/prometheus/rules/prometheus-prometheus-operator-prometheus-rulefiles-0/目录下面生成一个对应的<namespace>-<name>.yaml文件。
创建文件 prometheus-etcdRules.yaml,报警标准:不可用的 etcd 数量超过了一半那么就触发报警
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
labels:
prometheus: k8s
role: alert-rules
name: etcd-rules
namespace: monitoring
spec:
groups:
- name: etcd
rules:
- alert: EtcdClusterUnavailable
annotations:
summary: etcd cluster small
description: If one more etcd peer goes down the cluster will be unavailable
expr: |
count(up{job="etcd"} == 0) > (count(up{job="etcd"}) / 2 - 1)
for: 3m
labels:
severity: critical
注意 label 标签一定至少要有 prometheus=k8s 和 role=alert-rules,创建完成后,隔一会儿再去容器中查看下 rules 文件夹,会自动在上面的 prometheus-k8s-rulefiles-0 目录下面生成一个对应的<namespace>-<name>.yaml文件,
kubectl exec -it prometheus-k8s-0 /bin/sh -n monitoring
Defaulting container name to prometheus.
Use 'kubectl describe pod/prometheus-k8s-0 -n monitoring' to see all of the containers in this pod.
/prometheus $ ls /etc/prometheus/rules/prometheus-k8s-rulefiles-0/
monitoring-etcd-rules.yaml monitoring-prometheus-k8s-rules.yaml
可以看到我们创建的 rule 文件已经被注入到了对应的 rulefiles 文件夹下面了,证明我们上面的设想是正确的。然后再去 Prometheus Dashboard 的 Alert 页面下面就可以查看到上面我们新建的报警规则了:
直接修改PrometheusRule即可。
如果不是helm部署的,也可以通过如下方式:
1、 在prometheus-operator/contrib/kube-prometheus/manifests目录下(这个是Prometheus的k8s的yaml文件,不是helm的文件,可以通过helm template生成。)
先备份原来的prometheus-rules.yaml为prometheus-rules.yaml.default
2、修改vim prometheus-rules.yaml,修改完成后apply
3、 进入pod中
kubectl exec -it prometheus-k8s-0 /bin/sh -n monitoring查看上面注释删除掉告警是否已经不存在了
cat /etc/prometheus/rules/prometheus-k8s-rulefiles-0/monitoring-prometheus-k8s-rules.yaml |grep xx4、进入Prometheus web界面,查看规则是否已经删除。
监控域名访问可达性、监控证书过期
1、修改secret的yaml文件,添加监控地址(job_name: 'blackbox_http_2xx')
➜ ~ cat additional-scrape-configs.yaml
- job_name: 'istio/envoy-stats'
scrape_interval: 15s
metrics_path: /stats/prometheus
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_container_port_name]
action: keep
regex: '.*-envoy-prom'
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:15090
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: pod_name
- job_name: 'istio/istio-mesh'
scrape_interval: 15s
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- istio-system
relabel_configs:
- source_labels: [__meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: istio-telemetry;prometheus
metric_relabel_configs:
- source_labels: [source_workload_namespace]
action: replace
target_label: namespace
- job_name: 'istio/istio-policy'
scrape_interval: 15s
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- istio-system
relabel_configs:
- source_labels: [__meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: istio-policy;http-monitoring
- job_name: 'istio/istio-telemetry'
scrape_interval: 15s
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- istio-system
relabel_configs:
- source_labels: [__meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: istio-telemetry;http-monitoring
- job_name: 'istio/pilot'
scrape_interval: 15s
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- istio-system
relabel_configs:
- source_labels: [__meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: istio-pilot;http-monitoring
- job_name: 'istio/galley'
scrape_interval: 15s
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- istio-system
relabel_configs:
- source_labels: [__meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: istio-galley;http-monitoring
- job_name: 'istio/citadel'
scrape_interval: 15s
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- istio-system
relabel_configs:
- source_labels: [__meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: istio-citadel;http-monitoring
- job_name: 'prometheus-io-scrape'
kubernetes_sd_configs:
- role: pod
namespaces:
names:
- ingress-nginx
- ingress-controller
- kube-system
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__
regex: (https?)
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
- source_labels: [__meta_kubernetes_pod_node_name]
action: replace
target_label: node
regex: (.+)
replacement: $1
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: namespace
regex: (.+)
replacement: $1
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: pod
regex: (.+)
replacement: $1
- source_labels: [__meta_kubernetes_pod_ip]
action: replace
target_label: pod_ip
regex: (.+)
replacement: $1
- source_labels: [__meta_kubernetes_pod_host_ip]
action: replace
target_label: host_ip
regex: (.+)
replacement: $1
- source_labels: [__meta_kubernetes_pod_controller_kind]
action: replace
target_label: created_by_kind
regex: (.+)
replacement: $1
- source_labels: [__meta_kubernetes_pod_controller_name]
action: replace
target_label: created_by_kind
regex: (.+)
replacement: $1
- job_name: 'kubernetes-http-services'
metrics_path: /probe
params:
module: [http_2xx] # 使用定义的http模块
kubernetes_sd_configs:
- role: service # service 类型的服务发现
relabel_configs:
# 只有service的annotation中配置了 prometheus.io/http_probe=true 的才进行发现
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_http_probe]
action: keep
regex: true
- source_labels: [__address__]
target_label: __param_target
- target_label: __address__
replacement: blackbox:9115 #blackbox-exporter访问地址
- source_labels: [__param_target]
target_label: instance
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
target_label: kubernetes_name
- job_name: 'kubernetes-ingresses'
metrics_path: /probe
params:
module: [http_2xx] # 使用定义的http模块
kubernetes_sd_configs:
- role: ingress # ingress 类型的服务发现
relabel_configs:
# 只有ingress的annotation中配置了 prometheus.io/http_probe=true的才进行发现
- source_labels: [__meta_kubernetes_ingress_annotation_prometheus_io_http_probe]
action: keep
regex: true
- source_labels: [__meta_kubernetes_ingress_scheme,__address__,__meta_kubernetes_ingress_path]
regex: (.+);(.+);(.+)
replacement: ${1}://${2}${3}
target_label: __param_target
- target_label: __address__
replacement: blackbox:9115
- source_labels: [__param_target]
target_label: instance
- action: labelmap
regex: __meta_kubernetes_ingress_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_ingress_name]
target_label: kubernetes_name
- job_name: 'blackbox_http_2xx'
scrape_interval: 45s
metrics_path: /probe
params:
module: [http_2xx] # Look for a HTTP 200 response.
static_configs:
- targets:
- https://oppc.rencaiyoujia.com/ping
- https://api.rencaiyoujia.com/ping
- https://www.rencaiyoujia.com
- https://m.rencaiyoujia.com
- https://www.m.rencaiyoujia.com
- https://fa.rencaiyoujia.com
- https://rencaiyoujia.com
- https://rcpgadmin.rencaiyoujia.com
- https://dsfh5.rencaiyoujia.com
- http://pubapi.rencaiyoujia.com/ping
- https://assessh5.rencaiyoujia.com
- https://ommp.rencaiyoujia.com
- https://ommp.rencaiyoujia.com
- https://risk-rule.rencaiyoujia.com/ping
- https://bcpay-ad-api.rencaiyoujia.com/ping
- https://bcpay-api.rencaiyoujia.com/ping
- https://walletapi.rencaiyoujia.com/ping
- https://risk-frontend.rencaiyoujia.com
- https://withdraw.rencaiyoujia.com
- https://wallet.rencaiyoujia.com
- https://nacos.rencaiyoujia.com/nacos
- https://s.rencaiyoujia.com
- https://www.xiaocaiyoujia.com
- https://www.xiaocaiyoujia.com/h5/
- https://www.m.govjob.rencaiyoujia.com/
- https://www.rencaiyoujia.com/govapi/test/ping
- https://govjob.rencaiyoujia.com/auth
- https://govjobadmin.rencaiyoujia.com/
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__ #指定使用黑盒监控的target地址
replacement: blackbox:9115
2、删除老的secret,并生成新的secret。注意secret的名称要有Prometheus这个crd中定义的一致。
➜ ~ kubectl delete secret prometheus-cluster-monitoring-additional-scrape-configs -n cattle-prometheus
secret "prometheus-cluster-monitoring-additional-scrape-configs" deleted
➜ ~ kubectl create secret generic prometheus-cluster-monitoring-additional-scrape-configs --from-file=additional-scrape-configs.yaml -n cattle-prometheus
secret/prometheus-cluster-monitoring-additional-scrape-configs created
查看alertmanager的配置:
kubectl get secret alertmanager-prometheus-operator-alertmanager -o go-template='{{ index .data "alertmanager.yaml" }}' |base64 -D
更新alertmanager的配置:
直接更新此secret即可,自动生效。