prometheus+grafana监控 alertmanager钉钉告警
prometheus-dingtalk-alertmanager-grafana集成部署方案
一、 安装node-exporter
docker run -d -p 9100:9100 --name node-exporter -v /home/node-exporter/proc:/host/proc:ro -v /home/node-exporter/sys:/host/sys:ro -v /home/node-exporter/:/rootfs:ro prom/node-exporter
二、安装Prometheus
2.1 创建/etc/prometheus/prometheus.yml
global:
scrape_interval: 60s
evaluation_interval: 60s
scrape_configs:
- job_name: prometheus
static_configs:
- targets: ['prometheus-ip:prometheus-port'] # 采取prometheus指标数据
labels:
instance: prometheus
- job_name: linux
static_configs:
- targets: ['node-exporter-ip:node-exporter-port'] # 采取本地指标数据
labels:
instance: localhost
注意替换掉
prometheus-ip:prometheus-port
node-exporter-ip:node-exporter-port
2.2 启动prometheus
docker run -d -p 9190:9090 --name prometheus -v /etc/prometheus:/etc/prometheus prom/prometheus
三、启动webhook-dingtalk
docker run -d -p 8060:8060 --name webhook timonwong/prometheus-webhook-dingtalk
1、查询webhook映射,进入容器内修改配置文件
docker inspect webhook|grep Dir
2、进入mergerDir/etc/prometheus-webhook-dingtalk/
3、修改配置文件 修改webhook1
[root@node1 prometheus-webhook-dingtalk]# vi config.yml
## Request timeout
# timeout: 5s
## Uncomment following line in order to write template from scratch (be careful!)
#no_builtin_template: true
## Customizable templates path
#templates:
# - contrib/templates/legacy/template.tmpl
## You can also override default template using `default_message`
## The following example to use the 'legacy' template from v0.3.0
#default_message:
# title: '{{ template "legacy.title" . }}'
# text: '{{ template "legacy.content" . }}'
## Targets, previously was known as "profiles"
targets:
webhook1:
url: https://oapi.dingtalk.com/robot/send?access_token=ed49*************************7239514b010270f
# secret for signature
secret: SEC**********************
webhook2:
url: https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxx
webhook_legacy:
url: https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxx
# Customize template content
message:
# Use legacy template
title: '{{ template "legacy.title" . }}'
text: '{{ template "legacy.content" . }}'
webhook_mention_all:
url: https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxx
mention:
all: true
webhook_mention_users:
url: https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxx
mention:
mobiles: ['156xxxx8827', '189xxxx8325']
4、 重启webhook, 查看webhook日志
docker restart webhook
docker logs -f webhook
webhook1地址: http://localhost:8060/dingtalk/webhook1/send
localhost改为外部IP
ts=2024-04-11T02:28:21.340Z caller=main.go:113 component=configuration msg="Webhook urls for prometheus alertmanager" urls="http://localhost:8060/dingtalk/webhook_mention_all/send http://localhost:8060/dingtalk/webhook_mention_users/send http://localhost:8060/dingtalk/webhook1/send http://localhost:8060/dingtalk/webhook2/send http://localhost:8060/dingtalk/webhook_legacy/send"
四、启动alertmanager
1、 创建alertmanager.yml配置 vim /etct/alertmanager/alertmanager.yml
global:
resolve_timeout: 5m
route: # 告警路由配置,定义如何处理和发送告警
receiver: webhook
group_wait: 30s
group_interval: 1m
repeat_interval: 4h
group_by: [alertname]
routes:
- receiver: webhook
group_wait: 10s
receivers: # 告警接收者配置,定义如何处理和发送告警
- name: webhook
webhook_configs:
- url: {Webhook URL} # 告警 Webhook URL
send_resolved: true # 是否发送已解决的告警。如果设置为 true,则在告警解决时发送通知
注意webhook的url是前面webhookq启动日志里有的地址。将localhost改成服务器对应的ip即可。
2、修改prometheus.yml文件 添加以下内容
alerting:
alertmanagers:
- static_configs:
- targets: ["alertmanagers-ip:alertmanagers-port"]
# rule配置
rule_files:
- "/etc/prometheus/rules.yml"
3、 启动alertmanager
docker run -d -p 9093:9093 -v /etc/alertmanager/:/etc/alertmanager/ --name alertmanager prom/alertmanager
五、 添加告警规则
vim /etc/prometheus/rules.yml
groups:
- name: host_monitoring
rules:
- alert: 内存报警
expr: netdata_system_ram_MiB_average{chart="system.ram",dimension="free",family="ram"} < 800
for: 2m
labels:
team: node
annotations:
Alert_type: 内存报警
Server: '{{$labels.instance}}'
#summary: "{{$labels.instance}}: High Memory usage detected"
explain: "内存使用量超过90%,目前剩余量为:{{ $value }}M"
#description: "{{$labels.instance}}: Memory usage is above 80% (current value is: {{ $value }})"
- alert: CPU报警
expr: netdata_system_cpu_percentage_average{chart="system.cpu",dimension="idle",family="cpu"} < 20
for: 2m
labels:
team: node
annotations:
Alert_type: CPU报警
Server: '{{$labels.instance}}'
explain: "CPU使用量超过80%,目前剩余量为:{{ $value }}"
#summary: "{{$labels.instance}}: High CPU usage detected"
#description: "{{$labels.instance}}: CPU usage is above 80% (current value is: {{ $value }})"
- alert: 磁盘报警
expr: netdata_disk_space_GiB_average{chart="disk_space._",dimension="avail",family="/"} < 4
for: 2m
labels:
team: node
annotations:
Alert_type: 磁盘报警
Server: '{{$labels.instance}}'
explain: "磁盘使用量超过90%,目前剩余量为:{{ $value }}G"
- alert: 服务告警
expr: up == 0
for: 2s
labels:
team: node
annotations:
Alert_type: 服务报警
summary: 'instance {{$labels.instance}} down'
description: "netdata服务已关闭"
重启prometheus
docker restart prometheus
六、部署grafana
docker run -d -p 3000:3000 --name=grafana grafana/grafana
默认: admin/admin 第一次登陆需要修改密码
1、设置数据源
2、设置dashboards
填入 https://grafana.com/grafana/dashboards/405 点击load,即可下载Node-Exporter的dashboard
本文链接:
/archives/prometheus%2Bgrafana%E7%9B%91%E6%8E%A7%20alertmanager%E9%92%89%E9%92%89%E5%91%8A%E8%AD%A6
版权声明:
本站所有文章除特别声明外,均采用 CC BY-NC-SA 4.0 许可协议。转载请注明来自
DTL!
喜欢就支持一下吧
打赏
微信
支付宝