How to configure Prometheus Alertmanager

How to configure Alertmanager

This article is about prometheus-operator configuration with the Prometheus and Alertmanager from the Helm package located here: https://github.com/helm/charts/tree/master/stable/prometheus-operator

Who is who

  • Prometheus generating Alerts! You can find alert rules in the prometheus directory:

./templates/prometheus/rules/

You can also found it in the prometheus web interface ( /alert page ).

  • Alertmanager only sort, groups, slice (part of alerts by rules) it and send alerts (via email, slack and other methods) by your routes (escalation).

Alertmanager

Here is example of basic email routing for the prometheus-operator Helm chart. You can define it in the Prometheus values.yaml file (alertmanager section):

# alertmanager configuration
alertmanager:
  # global route configuration
  config:
    global:
      resolve_timeout: 5m
    route:
      group_by: ['job']
      group_wait: 30s
      group_interval: 5m
      repeat_interval: 24h
      receiver: 'default'
      routes:
      - match:
          alertname: Watchdog
        receiver: 'null'
    receivers:
      - name: 'null'
      - name: 'default'
        email_configs:
          - send_resolved: true
            from: "[email protected]"
            to: "[email protected]"
            smarthost: "mta:25"
            require_tls: false

Here you can see 2 receivers (null and default). The default receiver has email_configs with some configuration.

Read more about alertmanager configuration, routing and receivers:

  • https://prometheus.io/docs/alerting/configuration/#email_config
  • https://medium.com/curai-tech/constant-vigilance-a-step-by-step-guide-to-alerts-with-helm-and-prometheus-ae9554736031

Prometheus

  • Go to prometheus /alerts page and find TargetDown rule. You can find source code for this rule at the prometheus-operator/templates/prometheus/rules/general.rules.yaml file. So you can create a similar yaml-files for your own rules and just add it into the same directory!

  • If you want to convert existing rules from a yaml-file to the Helm template, you can use this script: prometheus-operator/hack/sync_prometheus_rules.py by adding your rules url:

    {
        'source': 'https://raw.githubusercontent.com/etcd-io/etcd/master/Documentation/op-guide/etcd3_alert.rules.yml',
        'destination': '../templates/prometheus/rules',

Here is an example of the TargetDown prometheus rule:

alert: TargetDown
expr: 100
  * (count by(job, namespace, service) (up == 0) / count by(job, namespace, service)
  (up)) > 10
for: 10m
labels:
  severity: warning
annotations:
  message: '{{ $value }}% of the {{ $labels.job }} targets are down.'

You can see a up == 0 expression, so you can test it by making a query:

up == 0 – will show you current instances in down state.

  • Now you can turn down some pods and check whats happens.
    • First of all it will be in the PENGING state (because rule has a 10 minutes timeout).
    • Then it brings it up to FIRING state. And you can find it in the alertmanager /alerts page.
    • After it you can check your mta logs and your mailbox.

Alertmanager Prometheus

Tagged with:

Leave a Reply

Your email address will not be published.