Continuous Monitoring

The word continuous describes a process that never ends, marked by uninterrupted extension in space, time or sequence. In the world of DevOps many things are continuous: Continuous Integration, Continuous Server Deployment, Continuous Application Deployment, Continuous Everything. Inspired by this definition, the term Continuous Monitoring is an ongoing process where the monitoring software is changed and adjusted permanently. Well, it’s actually the configuration of the software that is being changed rather the software itself. This brings me to the conclusion that in some way everyone who’s doing monitoring already does Continuous Monitoring in some way.

The root of Continuous is automation. The art of keeping your monitoring configuration up to date, within an infrastructure with an increasing frequency of changes, automated and without any human interaction is what I call Continuous Monitoring.

If you have 1000 servers and they are all the same, you most likely will be able to monitor all of them within a reasonable effort. The more flavours you maintain, the more human interaction is necessary. Each change in the infrastructure requires a change in the monitoring configuration and therefore a human interaction. In bigger environments, a group of users is responsible to keep the monitoring up to date. Every user has a different mindset and comes to different conclusions. In the end, humans make mistakes without any intention. But it’s not only about failures, humans simply can forget things, that’s what makes us human in the end. Regarding the monitoring configuration, it may lead to the absence of very basic checks for new hosts or services. Conventions and rules can help but do not solve the root cause. The only way out of this is automation.

Introducing Icinga 2

Icinga is an open source monitoring system that gives you detailed information about the availability of hosts and services in your environment. The data is either collected actively or received by agents and third party software. Icinga collects the information about What? is broken and forwards data to time series databases to discover the Why? Icinga integrates very well with existing DevOps tools and communication is always SSL encrypted. Finally, it gives you a nice overview in a web interface with customizable dashboards. This is the basic idea behind this monitoring tool, but how does it help us to automate things?

Icinga 2 has a powerful DSL that allows you to express your monitoring configuration as code and automate tasks. It gives users the ability to define their monitoring configuration according to the principle Describe and Express. The goal is to write human readable expressions that describe the behavior how and what should be monitored in your environment. Before going through this, keep the following expression in mind: “Monitor MySQL on any database server that is in production and not located in London”.

Now before this expression can work, you will need to describe your servers first. Regarding the sample expression, a database server would look like the following:

object Host “SuperServer” {
  check_command = “hostalive”
  address = “127.0.0.1”
  address6 = "::1”
  vars.os = “Linux”
  vars.mysql = true
  vars.location = “Chicago”
  vars.environment = “prod”
}

You are completely free to define any custom variable that you may need later in your expressions. These variables are later used in so called “apply rules”. Apply rules are the brain of your monitoring configuration. They decide about what should be monitored on what host, who should be notified and if there are any dependencies. The human readable expression mentioned earlier, transferred to Icinga DSL, looks like the following:

apply Service “MySQL” {
  check_command = “mysql”
  assign where vars.mysql == true \
            && vars.environment == “prod”
  ignore where vars.location = “London”
}

Here comes the clue: Every time you add a new host with the correct variables and matching values, the MySQL service will be monitored automatically. There is no need to define the service check over and over again. The same mechanism can also be used to define alerts. This is a very simple and basic example to demonstrate the principle behind the Icinga DSL. For more complex scenarios you would use arrays and dictionaries to loop over stuff. Use exceptions, conditions, and time periods to achieve exactly what you want to express.

You may think that this is only a partial automation, and you’re right. This only creates the services automatically, but what about the hosts? To define the hosts automatically and enrich them with valuable metadata you will need to do some extra work. Since every infrastructure is different, there is no one solution that fits all. Typically, users go through multiple stages of automation and in the end use a mix of all.

Stages of Automation

In the first stage of Continuous Monitoring, you should definitely combine your configuration management tool with Icinga. There are very well written Chef cookbooks in the Supermarket that let you automate almost any Icinga task. As part of the Chef partner program the cookbooks make sure to guarantee a certain level of quality.

The Icinga cookbooks enable you to set up and manage your whole monitoring environment, from the installation to initial configuration and ongoing maintenance. Additionally, clients can be set up and connected to the Icinga master automatically. Because you’re using Chef, you can leverage all of its knowledge about your servers: What is it running? Is it a web server? Is it a database server? How many disks does it have? And any other information that is available. Any Icinga configuration can be generated with custom Chef resources.

icinga2_host 'superserver.example.com' do
  display_name 'SuperServer'
  address      '127.0.0.12'
  custom_vars  :check_tcp_ports => %w(6379), :application => 'redis', :environment => 'production', :cluster_name => 'prodredis0001'
end

Icinga in combination with Chef is very powerful. The whole monitoring landscape can be updated automatically as your infrastructure grows. However, in some scenarios, even configuration management has its limitations. For example, if you want your users stored in an Active Directory for notifications during outages. Some still have a CMDB running somewhere that waits to be migrated, but still stores valuable information about certain servers. There are some cases that we cannot handle with config management. This is where the second stage of automation kicks in, the Icinga Director.

The Director is an extension for the Icinga web interface. It enables users to perform configuration tasks through the web interface instead of configuration files. However, the true power of the Director is in it’s capability to run imports from various data sources and transform the data into Icinga configuration. Data sources can be MySQL, LDAP, Active Directory, plain text files and much more.

The imported data can be transformed, modified, and finally verified and merged with existing Icinga configuration. Additional tasks allow automatic deployment of configuration changes. And because automation tends to fail sometimes, each change is versioned and can be rolled back at any time.

All these capabilities give us power to continually change and update our monitoring configuration during runtime and without any human interaction. Every business is different and has its own requirements. Some requirements are very special, such as connecting the monitoring software to custom written tools. To meet even the most exotic demands, Icinga provides HTTP based RESTful APIs to receive the state or modify the configuration during runtime. Without going too deep into technical details, using the API is the last stage of automation that you can reach with Icinga.

This was my journey through various stages of automation. There are plenty of tools out there that need to be monitored or connected to monitoring. Monitoring is a challenge that needs to be solved in any modern IT related company. By replacing manual procedures with automated Monitoring-as-Code with Chef and Icinga, monitoring can stop being a source of frustration, and become an instructive and rewarding process through Continuous Monitoring.

Posted in:

Blerim Sheqa

Blerim Sheqa works at NETWAYS, a company dedicated to open source software. He used to work as a Systems Engineer and help customers with their monitoring, logging and configuration management. As a Product Manager for Icinga he helps to develop the product strategy, conception and product management in general.