How to configure Alertmanager

This article is about prometheus-operator configuration with the Prometheus and Alertmanager from the Helm package located here:

Who is who

  • Prometheus generating Alerts! You can find alert rules in the prometheus directory:


You can also found it in the prometheus web interface ( /alert page ).

  • Alertmanager only sort, groups, slice (part of alerts by rules) it and send alerts (via email, slack and other methods) by your routes (escalation).


Here is example of basic email routing for the prometheus-operator Helm chart. You can define it in the Prometheus values.yaml file (alertmanager section):

# alertmanager configuration
  # global route configuration
      resolve_timeout: 5m
      group_by: ['job']
      group_wait: 30s
      group_interval: 5m
      repeat_interval: 24h
      receiver: 'default'
      - match:
          alertname: Watchdog
        receiver: 'null'
      - name: 'null'
      - name: 'default'
          - send_resolved: true
            from: "[email protected]"
            to: "[email protected]"
            smarthost: "mta:25"
            require_tls: false

Here you can see 2 receivers (null and default). The default receiver has email_configs with some configuration.

Read more about alertmanager configuration, routing and receivers:



  • Go to prometheus /alerts page and find TargetDown rule. You can find source code for this rule at the prometheus-operator/templates/prometheus/rules/general.rules.yaml file. So you can create a similar yaml-files for your own rules and just add it into the same directory!

  • If you want to convert existing rules from a yaml-file to the Helm template, you can use this script: prometheus-operator/hack/ by adding your rules url:

        'source': '',
        'destination': '../templates/prometheus/rules',

Here is an example of the TargetDown prometheus rule:

alert: TargetDown
expr: 100
  * (count by(job, namespace, service) (up == 0) / count by(job, namespace, service)
  (up)) > 10
for: 10m
  severity: warning
  message: '{{ $value }}% of the {{ $labels.job }} targets are down.'

You can see a up == 0 expression, so you can test it by making a query:

up == 0 – will show you current instances in down state.

  • Now you can turn down some pods and check whats happens.
    • First of all it will be in the PENGING state (because rule has a 10 minutes timeout).
    • Then it brings it up to FIRING state. And you can find it in the alertmanager /alerts page.
    • After it you can check your mta logs and your mailbox.

Alertmanager Prometheus

