prometheus-dingtalk-alertmanager-grafana集成部署方案

一、 安装node-exporter

docker run -d -p 9100:9100 --name node-exporter -v /home/node-exporter/proc:/host/proc:ro -v /home/node-exporter/sys:/host/sys:ro -v /home/node-exporter/:/rootfs:ro prom/node-exporter

二、安装Prometheus

2.1 创建/etc/prometheus/prometheus.yml

global:
  scrape_interval:     60s
  evaluation_interval: 60s
scrape_configs:
  - job_name: prometheus
    static_configs:
      - targets: ['prometheus-ip:prometheus-port'] # 采取prometheus指标数据
        labels:
          instance: prometheus
  - job_name: linux
    static_configs:
      - targets: ['node-exporter-ip:node-exporter-port'] # 采取本地指标数据
        labels:
          instance: localhost

注意替换掉

prometheus-ip:prometheus-port
node-exporter-ip:node-exporter-port

2.2 启动prometheus

docker run -d -p 9190:9090 --name prometheus -v /etc/prometheus:/etc/prometheus prom/prometheus

三、启动webhook-dingtalk

docker run -d -p 8060:8060 --name webhook timonwong/prometheus-webhook-dingtalk

1、查询webhook映射,进入容器内修改配置文件

docker inspect webhook|grep Dir


2、进入mergerDir/etc/prometheus-webhook-dingtalk/

3、修改配置文件 修改webhook1

[root@node1 prometheus-webhook-dingtalk]# vi config.yml
## Request timeout
# timeout: 5s

## Uncomment following line in order to write template from scratch (be careful!)
#no_builtin_template: true

## Customizable templates path
#templates:
#  - contrib/templates/legacy/template.tmpl

## You can also override default template using `default_message`
## The following example to use the 'legacy' template from v0.3.0
#default_message:
#  title: '{{ template "legacy.title" . }}'
#  text: '{{ template "legacy.content" . }}'

## Targets, previously was known as "profiles"
targets:
  webhook1:
    url: https://oapi.dingtalk.com/robot/send?access_token=ed49*************************7239514b010270f
    # secret for signature
    secret: SEC**********************
  webhook2:
    url: https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxx
  webhook_legacy:
    url: https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxx
    # Customize template content
    message:
      # Use legacy template
      title: '{{ template "legacy.title" . }}'
      text: '{{ template "legacy.content" . }}'
  webhook_mention_all:
    url: https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxx
    mention:
      all: true
  webhook_mention_users:
    url: https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxx
    mention:
      mobiles: ['156xxxx8827', '189xxxx8325']

4、 重启webhook, 查看webhook日志

docker restart webhook
docker logs -f webhook

webhook1地址: http://localhost:8060/dingtalk/webhook1/send

localhost改为外部IP

ts=2024-04-11T02:28:21.340Z caller=main.go:113 component=configuration msg="Webhook urls for prometheus alertmanager" urls="http://localhost:8060/dingtalk/webhook_mention_all/send http://localhost:8060/dingtalk/webhook_mention_users/send http://localhost:8060/dingtalk/webhook1/send http://localhost:8060/dingtalk/webhook2/send http://localhost:8060/dingtalk/webhook_legacy/send"

四、启动alertmanager

1、 创建alertmanager.yml配置 vim /etct/alertmanager/alertmanager.yml

global:
  resolve_timeout: 5m

route: # 告警路由配置,定义如何处理和发送告警
  receiver: webhook
  group_wait: 30s
  group_interval: 1m
  repeat_interval: 4h
  group_by: [alertname]
  routes:
  - receiver: webhook
    group_wait: 10s

receivers: # 告警接收者配置,定义如何处理和发送告警
- name: webhook
  webhook_configs:
  - url: {Webhook URL}  # 告警 Webhook URL
    send_resolved: true # 是否发送已解决的告警。如果设置为 true,则在告警解决时发送通知

注意webhook的url是前面webhookq启动日志里有的地址。将localhost改成服务器对应的ip即可。

2、修改prometheus.yml文件 添加以下内容

alerting:
  alertmanagers:
    - static_configs:
        - targets: ["alertmanagers-ip:alertmanagers-port"]
# rule配置
rule_files:
  - "/etc/prometheus/rules.yml"

3、 启动alertmanager

docker run -d -p 9093:9093 -v /etc/alertmanager/:/etc/alertmanager/ --name alertmanager prom/alertmanager

五、 添加告警规则

vim /etc/prometheus/rules.yml
groups:
  - name: host_monitoring
    rules:
      - alert: 内存报警
        expr: netdata_system_ram_MiB_average{chart="system.ram",dimension="free",family="ram"} < 800
        for: 2m
        labels:
          team: node
        annotations:
          Alert_type: 内存报警
          Server: '{{$labels.instance}}'
          #summary: "{{$labels.instance}}: High Memory usage detected"
          explain: "内存使用量超过90%,目前剩余量为:{{ $value }}M"
          #description: "{{$labels.instance}}: Memory usage is above 80% (current value is: {{ $value }})"
      - alert: CPU报警
        expr: netdata_system_cpu_percentage_average{chart="system.cpu",dimension="idle",family="cpu"} < 20
        for: 2m
        labels:
          team: node
        annotations:
          Alert_type: CPU报警
          Server: '{{$labels.instance}}'
          explain: "CPU使用量超过80%,目前剩余量为:{{ $value }}"
          #summary: "{{$labels.instance}}: High CPU usage detected"
          #description: "{{$labels.instance}}: CPU usage is above 80% (current value is: {{ $value }})"
      - alert: 磁盘报警
        expr: netdata_disk_space_GiB_average{chart="disk_space._",dimension="avail",family="/"} < 4
        for: 2m
        labels:
          team: node
        annotations:
          Alert_type: 磁盘报警
          Server: '{{$labels.instance}}'
          explain: "磁盘使用量超过90%,目前剩余量为:{{ $value }}G"
      - alert: 服务告警
        expr: up == 0
        for: 2s
        labels:
          team: node
        annotations:
          Alert_type: 服务报警
          summary: 'instance {{$labels.instance}} down'
          description: "netdata服务已关闭"

重启prometheus

docker restart prometheus

六、部署grafana

docker run -d -p 3000:3000 --name=grafana grafana/grafana

默认: admin/admin 第一次登陆需要修改密码

1、设置数据源

2、设置dashboards

填入 https://grafana.com/grafana/dashboards/405 点击load,即可下载Node-Exporter的dashboard

文章作者: Administrator
版权声明: 本站所有文章除特别声明外,均采用 CC BY-NC-SA 4.0 许可协议。转载请注明来自 DTL
喜欢就支持一下吧
打赏
微信 微信
支付宝 支付宝