How to Write Rules for Prometheus

First thing you should know is that there are two types of rules in Prometheus:

Recoding rules
Alerting rules

Rules are evaluated at regular intervals, and they can be included in prometheus.yml configuration file with the following line:

1 2	rule_files: - /etc/prometheus/rules/*.rules

Recording Rules

groups:
- name: recording_rules
  interval: 5s
  rules:
    - record: asia_shanghai_time
      expr: time()
    - record: asia_shanghai_hour
      expr:  hour(asia_shanghai_time)+8
- name: School Http Status Check
  rules:
  - alert: Http School Check Status
    expr: probe_success{job="blackbox-school-http"} == 0 and ON() asia_shanghai_hour >= 6 < 22
    for: 1m
    labels:
      severity: warning 
      env: school
    annotations:
      description: "机器:{{ $labels.instance }} 所属 job:{{ $labels.job }} http状态码: {{ printf `probe_http_status_code{instance='%s'}` $labels.instance | query | first | value }} http检测失败，请检查！"
      summary: "http检测"

Alerting Rules

CPU

groups:
- name: CPU报警规则
  rules:
  - alert: NodeCPUUsage
    expr: 100 - (avg by (instance)(irate(node_cpu_seconds_total{mode="idle"}[5m]) )) * 100 > 80
    for: 30m
    labels:
      serverity: warning
    annotations:
      description: "{{$labels.instance}}: High CPU usage detected"
      summary: "{{$labels.instance}}: CPU usage is above 90% (current value is: {{ $value }}"
  - alert: ContextSwitching
    expr: rate(node_context_switches_total[5m]) > 1000
    for: 30m
    labels:
      severity: warning
    annotations:
      summary: "Context switching (instance {{ $labels.instance }})"
      description: "Context switching is growing on node (> 1000 / s)\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

Memory

groups:
- name: 内存报警规则
  rules:
  - alert: NodeMemoryUsage
    expr: (node_memory_MemTotal_bytes - (node_memory_MemFree_bytes+node_memory_Buffers_bytes+node_memory_Cached_bytes )) / node_memory_MemTotal_bytes * 100 > 80
    for: 1m
    labels:
      severity: warning 
    annotations:
      description: "{{$labels.instance}}: Memory usage is above 80% (current value is: {{ $value }}"
      summary: "{{$labels.instance}}: High Memory usage detected"

Disk

groups:
- name: 磁盘报警规则
  rules:
  - alert: OutOfNodeDiskSpace
    expr: (node_filesystem_size_bytes - node_filesystem_avail_bytes) / node_filesystem_size_bytes * 100 > 80
    for: 1m
    labels:
      severity: warning
    annotations:
      description: "{{$labels.instance}}: Disk usage is above 80% (current value is: {{ $value }}"
      summary:  "{{$labels.instance}}: High Disk usage detected"
  - alert: UnusualDiskReadRate
    expr: sum by (instance) (irate(node_disk_read_bytes_total[2m])) / 1024 / 1024 > 50
    for: 30m
    labels:
      severity: warning
    annotations:
      summary: "Unusual disk read rate (instance {{ $labels.instance }})"
      description: "Disk is probably reading too much data (> 50 MB/s)\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"
  - alert: UnusualDiskWriteRate
    expr: sum by (instance) (irate(node_disk_written_bytes_total[2m])) / 1024 / 1024 > 50
    for: 30m
    labels:
      severity: warning
    annotations:
      summary: "Unusual disk write rate (instance {{ $labels.instance }})"
      description: "Disk is probably writing too much data (> 50 MB/s)\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"
  - alert: DiskWillFillIn16Hours
    expr: predict_linear(node_filesystem_free{filesystem!~"^/run(/|$)",fstype!~"tmpfs",mountpoint="/"}[1h], 16 * 3600) < 0 and on(instance, job) (time() - node_installation_time_seconds > 2 * 3600)
    for: 30m
    labels:
      severity: warning
    annotations:
      summary: "Out of inodes (instance {{ $labels.instance }})"
      description: "{{ $labels.instance }} will be soon out of disk space."
  - alert: UnusualDiskReadLatency
    expr: rate(node_disk_read_time_seconds_total[1m]) / rate(node_disk_reads_completed_total[1m]) > 100
    for: 30m
    labels:
      severity: warning
    annotations:
      summary: "Unusual disk read latency (instance {{ $labels.instance }})"
      description: "Disk latency is growing (read operations > 100ms)\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"
  - alert: UnusualDiskWriteLatency
    expr: rate(node_disk_write_time_seconds_total[1m]) / rate(node_disk_writes_completed_total[1m]) > 100
    for: 30m
    labels:
      severity: warning
    annotations:
      summary: "Unusual disk write latency (instance {{ $labels.instance }})"
      description: "Disk latency is growing (write operations > 100ms)\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

Network

groups:
- name: Unusual network throughput
  rules:
    - alert: UnusualNetworkThroughputIn
      expr: sum by (instance) (irate(node_network_receive_bytes_total[2m])) / 1024 / 1024 > 100
      for: 30m
      labels:
        severity: warning
      annotations:
        summary: "Unusual network throughput in (instance {{ $labels.instance }})"
        description: "Host network interfaces are probably receiving too much data (> 100 MB/s)\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"
    - alert: UnusualNetworkThroughputOut
      expr: sum by (instance) (irate(node_network_transmit_bytes_total[2m])) / 1024 / 1024 > 100
      for: 30m
      labels:
        severity: warning
      annotations:
        summary: "Unusual network throughput out (instance {{ $labels.instance }})"
        description: "Host network interfaces are probably sending too much data (> 100 MB/s)\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

blackbox for ssl_expiry

groups: 
  - name: ssl_expiry.rules 
    rules: 
    - alert: SSLCertExpiringSoon 
      expr: probe_ssl_earliest_cert_expiry{job="blackbox-http"} - time() < 86400 * 30 
      for: 30m
      labels:
        severity: info
      annotations:
        summary: "SSL certificate will expire soon (instance {{ $labels.instance }})"
        description: "SSL certificate expires in 30 days\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"
    - alert: SslCertificateHasExpired
      expr: probe_ssl_earliest_cert_expiry - time()  <= 0
      for: 30m
      labels:
        severity: warning
      annotations:
        summary: "SSL certificate has expired (instance {{ $labels.instance }})"
        description: "SSL certificate has expired already\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

blackbox for http

groups:
- name: Http Status Check
  rules:
  - alert: Http Status Check
    expr: probe_success{job="blackbox-http"} == 0
    for: 1m
    labels:
      severity: warning 
    annotations:
      description: "机器:{{ $labels.instance }} 所属 job:{{ $labels.job }} http状态码: {{ printf `probe_http_status_code{instance='%s'}` $labels.instance | query | first | value }} http检测失败，请检查！"
      summary: "http检测"

mysql

groups:
- name: MySQLStatsAlert
  rules:
    - alert: MySQL is down
      expr: mysql_up == 0
      for: 1m
      labels:
        severity: critical
      annotations:
        summary: "Instance {{ $labels.instance }} MySQL is down"
        description: "MySQL database is down. This requires immediate action!"
    - alert: Mysql_High_QPS
      expr: rate(mysql_global_status_questions[5m]) > 500 
      for: 2m
      labels:
        severity: warning
      annotations:
        summary: "{{$labels.instance}}: Mysql_High_QPS detected"
        description: "{{$labels.instance}}: Mysql opreation is more than 500 per second ,(current value is: {{ $value }})"  
    - alert: Mysql_Too_Many_Connections
      expr: rate(mysql_global_status_threads_connected[5m]) > 200
      for: 2m
      labels:
        severity: warning
      annotations:
        summary: "{{$labels.instance}}: Mysql Too Many Connections detected"
        description: "{{$labels.instance}}: Mysql Connections is more than 100 per second ,(current value is: {{ $value }})"  
    - alert: Mysql_Too_Many_slow_queries
      expr: rate(mysql_global_status_slow_queries[5m]) > 3
      for: 2m
      labels:
        severity: warning
      annotations:
        summary: "{{$labels.instance}}: Mysql_Too_Many_slow_queries detected"
        description: "{{$labels.instance}}: Mysql slow_queries is more than 3 per second ,(current value is: {{ $value }})"  
    - alert: SQL thread stopped 
      expr: mysql_slave_status_slave_sql_running == 0
      for: 1m
      labels:
        severity: critical
      annotations:
        summary: "Instance {{ $labels.instance }} SQL thread stopped"
        description: "SQL thread has stopped. This is usually because it cannot apply a SQL statement received from the master."
    - alert: Slave lagging behind Master
      expr: rate(mysql_slave_status_seconds_behind_master[5m]) >30 
      for: 1m
      labels:
        severity: warning 
      annotations:
        summary: "Instance {{ $labels.instance }} Slave lagging behind Master"
        description: "Slave is lagging behind Master. Please check if Slave threads are running and if there are some performance issues!"

Ali Cloud

groups:
- name: slb
  rules:
  - alert: slb_5xx_percent:warning
    expr: |-
      sum(aliyun_acs_slb_dashboard_StatusCode5xx) by (vip, port) /
      sum(aliyun_acs_slb_dashboard_Qps) by (vip, port) > 0.01
    labels:
      severity: 2
    for: 5m
    annotations:
      summary: 'SLB {{ $labels.vip }}:{{ $labels.port }} 5xx percent > 1%'
  - alert: slb_5xx_percent:high
    expr: |-
      sum(aliyun_acs_slb_dashboard_StatusCode5xx) by (vip, port) /
      sum(aliyun_acs_slb_dashboard_Qps) by (vip, port) > 0.05
    labels:
      severity: 1
    for: 5m
    annotations:
      summary: 'SLB {{ $labels.vip }}:{{ $labels.port }} 5xx percent > 5%'
  - alert: slb_5xx_percent:critical
    expr: |-
      sum(aliyun_acs_slb_dashboard_StatusCode5xx) by (vip, port) /
      sum(aliyun_acs_slb_dashboard_Qps) by (vip, port) > 0.1
    labels:
      severity: 0
    for: 5m
    annotations:
      summary: 'SLB {{ $labels.vip }}:{{ $labels.port }} 5xx percent > 10%'
  - alert: slb_response_time:high
    expr: |-
      avg(aliyun_acs_slb_dashboard_Rt) by (vip, port) > 200
    labels:
      severity: 1
    for: 5m
    annotations:
      summary: 'SLB {{ $labels.vip }}:{{ $labels.port }} RT > 200ms'
  - alert: slb_response_time:critical
    expr: |-
      avg(aliyun_acs_slb_dashboard_Rt) by (vip, port) > 500
    labels:
      severity: 0
    for: 5m
    annotations:
      summary: 'SLB {{ $labels.vip }}:{{ $labels.port }} RT > 500ms'
  - alert: slb_tx_traffic_drop_percent:critical
    expr: |-
      sum(aliyun_acs_slb_dashboard_DropTrafficTX) by (vip, port) /
      sum(aliyun_acs_slb_dashboard_TrafficTXNew) by (vip, port) > 0.001
    labels:
      severity: 0
    for: 5m
    annotations:
      summary: 'SLB {{ $labels.vip }}:{{ $labels.port }} tx traffic drop percent > 0.1%'
  - alert: slb_rx_traffic_drop_percent:critical
    expr: |-
      sum(aliyun_acs_slb_dashboard_DropTrafficRX) by (vip, port) /
      sum(aliyun_acs_slb_dashboard_TrafficRXNew) by (vip, port) > 0.001
    labels:
      severity: 0
    for: 5m
    annotations:
      summary: 'SLB {{ $labels.vip }}:{{ $labels.port }} rx traffic drop percent > 0.1%'

- name: ecs
  rules:
  - alert: ecs_cpu_pressure:warning
    expr: |-
      (aliyun_acs_ecs_dashboard_CPUUtilization > 80)
      * on (instanceId) group_left(VpcAttributes,HostName,InnerIpAddress)
      label_replace(aliyun_meta_ecs_info,"instanceId","$1","InstanceId","(.*)")
    labels:
      severity: 2
    for: 5m
    annotations:
      summary: 'ECS {{ $labels.HostName }} cpu usage > 80%'
  - alert: ecs_cpu_pressure:high
    expr: |-
      (aliyun_acs_ecs_dashboard_CPUUtilization > 95)
      * on (instanceId) group_left(VpcAttributes,HostName,InnerIpAddress)
      label_replace(aliyun_meta_ecs_info,"instanceId","$1","InstanceId","(.*)")
    labels:
      severity: 1
    for: 5m
    annotations:
      summary: 'ECS {{ $labels.HostName }} cpu usage > 95%'
  - alert: ecs_memory_pressure:warning
    expr: |-
      (aliyun_acs_ecs_dashboard_memory_usedutilization > 80)
      * on (instanceId) group_left(VpcAttributes,HostName,InnerIpAddress)
      label_replace(aliyun_meta_ecs_info,"instanceId","$1","InstanceId","(.*)")
    labels:
      severity: 2
    for: 5m
    annotations:
      summary: 'ECS {{ $labels.HostName }} memory usage > 80%'
  - alert: ecs_memory_pressure:high
    expr: |-
      (aliyun_acs_ecs_dashboard_memory_usedutilization > 95)
      * on (instanceId) group_left(VpcAttributes,HostName,InnerIpAddress)
      label_replace(aliyun_meta_ecs_info,"instanceId","$1","InstanceId","(.*)")
    labels:
      severity: 1
    for: 5m
    annotations:
      summary: 'ECS {{ $labels.HostName }} memory usage > 95%'
  - alert: ecs_load_avg:warning
    expr: |-
      (aliyun_acs_ecs_dashboard_load_5m > 10)
      * on (instanceId) group_left(VpcAttributes,HostName,InnerIpAddress)
      label_replace(aliyun_meta_ecs_info,"instanceId","$1","InstanceId","(.*)")
    labels:
      severity: 2
    for: 5m
    annotations:
      summary: 'ECS {{ $labels.HostName }} loadAvg5m > 10'
  - alert: ecs_load_avg:high
    expr: |-
      (aliyun_acs_ecs_dashboard_load_5m > 20)
      * on (instanceId) group_left(VpcAttributes,HostName,InnerIpAddress)
      label_replace(aliyun_meta_ecs_info,"instanceId","$1","InstanceId","(.*)")
    labels:
      severity: 1
    for: 5m
    annotations:
      summary: 'ECS {{ $labels.HostName }} loadAvg5m > 20'
  - alert: ecs_disk_pressure:high
    expr: |-
      (aliyun_acs_ecs_dashboard_diskusage_utilization > 90)
      * on (instanceId) group_left(VpcAttributes,HostName,InnerIpAddress)
      label_replace(aliyun_meta_ecs_info,"instanceId","$1","InstanceId","(.*)")
    labels:
      severity: 1
    for: 5m
    annotations:
      summary: 'ECS {{ $labels.HostName }} disk usage > 80%'
  - alert: ecs_disk_pressure:critical
    expr: |-
      (aliyun_acs_ecs_dashboard_diskusage_utilization > 95)
      * on (instanceId) group_left(VpcAttributes,HostName,InnerIpAddress)
      label_replace(aliyun_meta_ecs_info,"instanceId","$1","InstanceId","(.*)")
    labels:
      severity: 0
    for: 5m
    annotations:
      summary: 'ECS {{ $labels.HostName }} disk usage > 95%'
  - alert: ecs_too_many_connections:warning
    expr: |-
      (aliyun_acs_ecs_dashboard_tcpconnection{state="TCP_TOTAL"} > 1000)
      * on (instanceId) group_left(VpcAttributes,HostName,InnerIpAddress)
      label_replace(aliyun_meta_ecs_info,"instanceId","$1","InstanceId","(.*)")
    labels:
      severity: 2
    for: 5m
    annotations:
      summary: 'ECS {{ $labels.HostName }} tcp_total > 1000'
  - alert: ecs_too_many_connections:high
    expr: |-
      (aliyun_acs_ecs_dashboard_tcpconnection{state="TCP_TOTAL"} > 2000)
      * on (instanceId) group_left(VpcAttributes,HostName,InnerIpAddress)
      label_replace(aliyun_meta_ecs_info,"instanceId","$1","InstanceId","(.*)")
    labels:
      severity: 1
    for: 5m
    annotations:
      summary: 'ECS {{ $labels.HostName }} tcp_total > 2000'

- name: rds
  rules:
  - alert: rds_cpu_pressure:high
    expr: |-
      sum(aliyun_acs_rds_dashboard_CpuUsage
        * on (instanceId) group_left(DBInstanceDescription,ZoneId)
        label_replace(aliyun_meta_rds_info, "instanceId", "$1", "DBInstanceId", "(.*)"))
      without (instance, userId, job) > 85
    labels:
      severity: 1
    for: 5m
    annotations:
      summary: 'RDS {{ $labels.DBInstanceDescription }} under high cpu pressure > 85%'
  - alert: rds_cpu_pressure:critical
    expr: |-
      sum(aliyun_acs_rds_dashboard_CpuUsage
        * on (instanceId) group_left(DBInstanceDescription,ZoneId)
        label_replace(aliyun_meta_rds_info, "instanceId", "$1", "DBInstanceId", "(.*)"))
      without (instance, userId, job) > 95
    labels:
      severity: 0
    for: 5m
    annotations:
      summary: 'RDS {{ $labels.DBInstanceDescription }} under critical cpu pressure > 95%'
  - alert: rds_memory_pressure:high
    expr: |-
      sum(aliyun_acs_rds_dashboard_MemoryUsage
        * on (instanceId) group_left(DBInstanceDescription,ZoneId)
        label_replace(aliyun_meta_rds_info, "instanceId", "$1", "DBInstanceId", "(.*)"))
      without (instance, userId, job) > 85
    labels:
      severity: 1
    for: 5m
    annotations:
      summary: 'RDS {{ $labels.DBInstanceDescription }} under high memory pressure > 85%'
  - alert: rds_memory_pressure:critical
    expr: |-
      sum(aliyun_acs_rds_dashboard_MemoryUsage
        * on (instanceId) group_left(DBInstanceDescription,ZoneId)
        label_replace(aliyun_meta_rds_info, "instanceId", "$1", "DBInstanceId", "(.*)"))
      without (instance, userId, job) > 95
    labels:
      severity: 0
    for: 5m
    annotations:
      summary: 'RDS {{ $labels.DBInstanceDescription }} under critical memory pressure > 95%'
  - alert: rds_iops_pressure:high
    expr: |-
      sum(aliyun_acs_rds_dashboard_IOPSUsage
        * on (instanceId) group_left(DBInstanceDescription,ZoneId)
        label_replace(aliyun_meta_rds_info, "instanceId", "$1", "DBInstanceId", "(.*)"))
      without (instance, userId, job) > 80
    labels:
      severity: 1
    for: 5m
    annotations:
      summary: 'RDS {{ $labels.DBInstanceDescription }} under high iops pressure > 80%'
  - alert: rds_iops_pressure:critical
    expr: |-
      sum(aliyun_acs_rds_dashboard_IOPSUsage
        * on (instanceId) group_left(DBInstanceDescription,ZoneId)
        label_replace(aliyun_meta_rds_info, "instanceId", "$1", "DBInstanceId", "(.*)"))
      without (instance, userId, job) > 90
    labels:
      severity: 0
    for: 5m
    annotations:
      summary: 'RDS {{ $labels.DBInstanceDescription }} under high iops pressure > 90%'
  - alert: rds_disk_space_exhausted:warning
    expr: |-
      sum(aliyun_acs_rds_dashboard_DiskUsage
        * on (instanceId) group_left(DBInstanceDescription,ZoneId)
        label_replace(aliyun_meta_rds_info, "instanceId", "$1", "DBInstanceId", "(.*)"))
      without (instance, userId, job) > 85
    labels:
      severity: 2
    for: 5m
    annotations:
      summary: 'RDS {{ $labels.DBInstanceDescription }} disk space under pressure > 85%'
  - alert: rds_disk_space_exhausted:critical
    expr: |-
      sum(aliyun_acs_rds_dashboard_DiskUsage
        * on (instanceId) group_left(DBInstanceDescription,ZoneId)
        label_replace(aliyun_meta_rds_info, "instanceId", "$1", "DBInstanceId", "(.*)"))
      without (instance, userId, job) > 95
    labels:
      severity: 0
    for: 5m
    annotations:
      summary: 'RDS {{ $labels.DBInstanceDescription }} disk space will be exhausted soon > 95%'
  - alert: rds_connection_pressure:high
    expr: |-
      sum(aliyun_acs_rds_dashboard_ConnectionUsage
        * on (instanceId) group_left(DBInstanceDescription,ZoneId)
        label_replace(aliyun_meta_rds_info, "instanceId", "$1", "DBInstanceId", "(.*)"))
      without (instance, userId, job) > 85
    labels:
      severity: 1
    for: 5m
    annotations:
      summary: 'RDS {{ $labels.DBInstanceDescription }} connection usage > 85%'
  - alert: rds_connection_pressure:critical
    expr: |-
      sum(aliyun_acs_rds_dashboard_ConnectionUsage
        * on (instanceId) group_left(DBInstanceDescription,ZoneId)
        label_replace(aliyun_meta_rds_info, "instanceId", "$1", "DBInstanceId", "(.*)"))
      without (instance, userId, job) > 95
    labels:
      severity: 1
    for: 5m
    annotations:
      summary: 'RDS {{ $labels.DBInstanceDescription }} connection usage > 95%'

- name: redis
  rules:
  - alert: redis_cpu_pressure:high
    expr: |-
      sum(aliyun_acs_kvstore_CpuUsage
          * on (instanceId) group_left(PrivateIp,InstanceName)
          label_replace(aliyun_meta_redis_info, "instanceId", "$1", "UserName", "(.*)"))
      without (instance, userId, job) > 85
    labels:
      severity: 1
    for: 5m
    annotations:
      summary: 'Redis {{ $labels.InstanceName }} under high cpu pressure > 85%'
  - alert: redis_cpu_pressure:critical
    expr: |-
      sum(aliyun_acs_kvstore_CpuUsage
          * on (instanceId) group_left(PrivateIp,InstanceName)
          label_replace(aliyun_meta_redis_info, "instanceId", "$1", "UserName", "(.*)"))
      without (instance, userId, job) > 95
    labels:
      severity: 0
    for: 5m
    annotations:
      summary: 'Redis {{ $labels.InstanceName }} under high cpu pressure > 95%'
  - alert: redis_memory_pressure:high
    expr: |-
      sum(aliyun_acs_kvstore_MemoryUsage
          * on (instanceId) group_left(PrivateIp,InstanceName)
          label_replace(aliyun_meta_redis_info, "instanceId", "$1", "UserName", "(.*)"))
      without (instance, userId, job) > 85
    labels:
      severity: 1
    for: 5m
    annotations:
      summary: 'Redis {{ $labels.InstanceName }} memory usage > 85%'
  - alert: redis_memory_pressure:critical
    expr: |-
      sum(aliyun_acs_kvstore_MemoryUsage
          * on (instanceId) group_left(PrivateIp,InstanceName)
          label_replace(aliyun_meta_redis_info, "instanceId", "$1", "UserName", "(.*)"))
      without (instance, userId, job) > 95
    labels:
      severity: 0
    for: 5m
    annotations:
      summary: 'Redis {{ $labels.InstanceName }} memory usage > 95%'
  - alert: redis_connection_pressure:high
    expr: |-
      sum(aliyun_acs_kvstore_ConnectionUsage
          * on (instanceId) group_left(PrivateIp,InstanceName)
          label_replace(aliyun_meta_redis_info, "instanceId", "$1", "UserName", "(.*)"))
      without (instance, userId, job) > 85
    labels:
      severity: 1
    for: 5m
    annotations:
      summary: 'Redis {{ $labels.InstanceName }} connection usage > 85%'
  - alert: redis_connection_pressure:critical
    expr: |-
      sum(aliyun_acs_kvstore_ConnectionUsage
          * on (instanceId) group_left(PrivateIp,InstanceName)
          label_replace(aliyun_meta_redis_info, "instanceId", "$1", "UserName", "(.*)"))
      without (instance, userId, job) > 95
    labels:
      severity: 0
    for: 5m
    annotations:
      summary: 'Redis {{ $labels.InstanceName }} connection usage > 95%'

Check Rules File

Prometheus includes a useful utility that you can use to check whether the rules you’ve written are OK. You can use it like this:

1	promtool check config /etc/prometheus/prometheus.yml

ref
Prometheus 操作符
 prometheus
Combining alert conditions
Prometheus in Practice
Time of day based notifications with Prometheus and Alertmanager
aliyun-exporter