A Layman's Guide to OpenShift Monitoring and Logging
High level view...
-
1. This blog will introduce the monitoring and logging capabilities, as well as the configuration knobs avaialble for monitoring OpenShift cluster
-
2. The second article of the series will focus on logging and walk you through the steps to send logs to external syslog server
-
3. The third part of the blog will revisit Monitoring, and walk you through the steps to pull that data by an external application
So you have an OpenShift cluster comprising multiple nodes with a number of Containerized Networking Functions (CNF) running on it. How do you monitor this cluster? How do you gather the logs from all the nodes, and especially How do you collect the logs from specific CNFs and store/process those logs?
Finding these answers by looking at the documentation can prove to be a daunting task for such a simple purpose. This blog, and its subsequent part, will provide a concise and simple guideline to all of those questions.
Data collection from a cluster and its workloads can be placed in two separate categories:
- Monitoringi data in OpenShift lingo refers to performance and telemetry information. This blog will explain how this data is collected and how it can be stored.
- Logs generated by the cluster infrastructure, applications/CNF/workloads running on the cluster, and any audit logs. Log collection will be discussed in detail and examples in the next blog.
Monitoring Data from an OpenShift Cluster
This is the relatively easier one to discuss - because the basic monitoring doesn’t need anything special to be done. The clients and applications needed for collecting monitoring data are already installed and enabled at the time of installation using an operator called Cluster Monitoring Operator `
Collection of infrastructure logs is enabled by default. At the time of installation, a namespace openshift-monitoring is created and the monitoring related pods are spun up in there. One set of those is the node-exporter, run by a daemonset; hence running a copy on each node of the cluster. These node-exporter pods are responsible for collecting data from each individual node and send that to a prometheus pod. The prometheus pod is also spun up by default, and collects the monitoring metrics. The following output shows these pods in a 3-node compact cluster:
Other pods that can be seen here are the “Thanos Querier” , “Telemeter Client” and “Alert Manager”, each running in a redundant fashion, and all of them installed by the monitoring operator. Each has a different purpose, for example, the Telemeter Client"is responsible for pushing cluster's metrics to RedHat, hence allowing for monitoring of cluster using redhat console.
The collected data is available under the observability” menu item in OpenShift console, as shown in the following screenshot:
Storing Monitoring Data
Data collected by prometheus is not stored by default. Storing this data can be enabled, however, a block storage device is required for that.
Amount of data retention can be defined using a retention period, or storage capacit. The config map (in openshift-monitoring namespace) called “cluster-monitoring-config” is used for this purpose. The following example demonstrates how this configmap can be used to define a data retention policy for one-day or 20Gig:
Monitoring User Workload Data
By default, monitoring is enabled only for the OpenShift infrastructure. However, monitoring can also be enabled for the pods running workloads or CNFs. This is done by using another configuration knob by adding the following line in the same configmap:
This creates an additional monitoring namespace called openshift-user-workload-monitoring, and promethues pods for workload monitoring are started under that namesapce as shown here:
If the parameters for the workload monitoring have to be configured or adjusted, that can be achieved by using a config map called user-workload-monitoring-config in the openshift-user-workload-monitoring namespace. This blog will not go into further detail about the various parameters and knobs available to advanced users, but this documentation page can be referenced for exploring those options.
Forwarding Monitoring Data:
Oftentimes in a production environment, an external OSS/BSS system is collecting monitoring data across many devices and clusters. Forwarding of the monitoring data to such an external data collection server can be achieved by defining a remoteWrite URL in the cluster-monitoring-config configmap, as shown here:
The URL here is the pointer to the remote monitoring server, Kafka bus etc.
Though this blog doesn’t go into further details on configuring authentication (if needed) and tags (if desired) for metrics being sent to remote location, these and other such details and examples can be found here
That concludes all the basics that one would need to understand OpenShift monitoring concepts and conciguration. For next level information on dealing with OpenShift Logs and Metrics, continue on the subsequent parts of this article