Scaling today is crucial. In the era of information explosion, where traffic and data are growing without control and systems are often under severe load, scaling abilities–both horizontally and vertically–must become an integral part of every infrastructure. Kubernetes does exactly this, providing confidence in continuous availability and an out-of-the-box ability to configure thresholds for memory and CPU, which inform the cluster about scaling requirements of the configured services. Kubernetes ensures all stakeholders, from DevOps to management, that your company is ready–at all times–to handle high traffic. But is this enough?

Prometheus and Kubernetes

A correctly configured Kubernetes cluster (or clusters) can provide confidence in availability and resilience to traffic changes. But you also need observability of applicative events and customer use cases for R&D to identify and troubleshoot defects, incidents, and anomalies. Topics such as container health, average response time, error rate, and cluster capacity should be continuously measured and monitored in order to provide insight into the application’s functionality and performance. 

Prometheus is an open-source tool for collecting time-series metrics that was recently adopted by the CNFC. It is one of the most famous and commonly used metric systems due to its ease-of-use, simplicity of integration with Kubernetes and other components, and added value for R&D. The observability provided by Prometheus via its metrics collection and alerts is crucial for R&D to be able to reduce MTTR, improve production SLAs, and create a more stable production environment.

Capacity Concerns

Still, Prometheus’ capacity is limited. When Kubernetes is continuously hit with a large number of requests, e.g., millions of requests on hundreds of nodes, or when one Prometheus server supports a multi-cluster architecture, Prometheus’ collectors are put under immense pressure, and the server’s ability to collect, store and process a time series starts to lag. The latency that is created under such high traffic, in turn, affects the tool’s ability to provide real-time metrics, decreasing R&D’s ability to respond to events as they occur. 

Scaling Prometheus for Kubernetes 

The solution to the capacity concerns discussed above is scaling, but you must first properly predict and understand how such high traffic may become a problem in the future for one or more of the Kubernetes clusters. You also need to plan for such a load from the very beginning. 

Prometheus federation is a solution for simple scaling, where one master server collects metrics from different servers in different datacenters. 

Setting this architecture up is quite simple. A master server is deployed with “targets” that contain a list of slave Prometheus server URLs, like this:

      - job_name: federated_prometheus
        honor_labels: true
        metrics_path: /federate
          - '{job="the-prom-job"}'
          - targets:
            - prometheus-slave1:9090
            - prometheus-slave2:9090

The match[] param in the configuration instructs Prometheus to accumulate and store all the slave metrics for a specific job. You can also set it as a regex: {__name__=~”^job:.*”}. This will collect metrics from several different jobs that match the expression definition. 

The Prometheus slaves should have the following configuration: 

    slave: 1
  - source_labels: [_prometheus_slave]
    action: keep
    regex: slave1

Automating the deployment of Prometheus is simple with this configuration, as DevOps simply need to fill in the targets from a managed array of Prometheus slave “names” after scripting the Prometheus deployment. Then, they can automatically deploy slaves with their respective name using the same array. 

Prometheus Performance

A federated Prometheus solution helps to create high availability, but it also solves another problem that Prometheus experiences under high load: performance. 

In Prometheus 2.0, the CNCF, owner of the Prometheus project, tried to address this performance problem by adding support for big data architectures. This changed how Prometheus indexed, making it so that the total number of time series no longer impacted server performance (See KubeCon keynote). This was done using TSDB, written by Fabian Reinartz. Now, Prometheus’ data is split into different chunks, with each slice being a two-hour database. Each chunk is maintained in-memory, written out to storage when completed, and compacted into larger chunks later on. Each chunk maintains an inverted index implemented via posting lists, and the in-memory data is accessible using the map pages of memory (mmap) function. This change has reduced the amount of disk writes by ~10x.

Prometheus memory consumption has always been a pain point for DevOps. The new storage layers, both the queried and un-queried, not only introduce significant memory savings but a more stable, predictable allocation as well. This can be especially useful in containerized environments, where DevOps need to force resource restrictions.

Figure 1: Memory usage comparison between Prometheus 1.5 and 2.0 (Source:


Additionally, CPU usage saw a large decrease in Prometheus 2.0.   


Figure 2: CPU usage comparison between Prometheus 1.5 and 2.0 (Source:

CPU usage comparison between Prometheus 1.5 and 2.0 (Source: this still wasn’t a complete solution, leading to yet another improvement. Prometheus federation, in version 2.0, allows for the focusing of slaves on specific services, thus handling data load in a managed way. For example, one slave can monitor and collect metrics for one or two central services with the highest traffic, and another slave can be configured to monitor other services with lower usage.   


Another aspect that DevOps must take into consideration when planning for scale is storage. 

Prometheus has gone through three transmutations for optimizing its storage management: 

  • Centralized time-series database with some in-memory buffering 
  • Custom file per time series, with LevelDB functionality for indexing 
  • Custom time-block design, allowing time series to be saved in its own data chunk, with posting lists for improved indexing

Each revision improved the previous version, allowing performance to go from 50,000 samples per second to millions of samples per second; it also functioned better in more dynamic and frequently-deployed environments.

Still, as data size is continuously growing, the storage drivers should be capable of handling growth in order to supply reliable historical data. The simple solution for this is using cloud elastic storage services such as AWS S3 or Google Storage as storage backends. These services provide endless capacity for Prometheus servers that need to maintain large amounts of data, such as servers connected to a large Kubernetes cluster or several clusters with one monitoring endpoint. 

Prometheus itself provides a solution for advanced storage management: storage scalability with snapshots. Taking snapshots of Prometheus data and deleting the data using the storage retention configuration, users can have data older than X days or months, or larger than a specific size, available in real-time. They can also then store old data on a separate disk and have it available on demand. There are two steps for making this process effective.

The first step is taking snapshots of Prometheus data, which can be done using Prometheus API. In order to use it, Prometheus API must first be enabled, using the CLI command:

./prometheus --storage.tsdb.path=data/ --web.enable-admin-api

The next step is to take the snapshot:

curl -XPOST http://{prometheus}:9090/api/v1/admin/tsdb/snapshot

Snapshots are stored in the data/snapshots directory.

Automating this command, running it as part of a daily or monthly procedure, and moving the snapshots to an external storage solution allows DevOps to retain historical data without having to continuously monitor the storage usage of Prometheus. 

The second step is to configure the required retention time period of time series or the disk size that DevOps maintain. This can be configured easily using the –storage.tsdb.retention.size  and –storage.tsdb.retention.time flags. 

After the time-series time period has passed, or the data size has reached the configured threshold, the data will be removed from the Prometheus server, will not consume disk resources, and will no longer be available. 

When R&D needs to access the historical data stored as a snapshot, Prometheus can simply be pointed to a specific snapshot using the –storage.tsdb.path attribute.

Thanos: The Alternative Tool

For DevOps who cannot invest the time or do not have hands-on experience with a scalable Prometheus solution, Thanos was created. Thanos simplifies the work behind multiple Prometheus deployments in order to achieve a complete picture of all metrics. When Thanos’ sidecar component is deployed alongside each running Prometheus server, it functions as a proxy that sends Prometheus local data over a Thanos API. Thanos then allows you to select time-series data by way of labels and per a given time range, just like Prometheus does, although Prometheus doesn’t have to maintain the data itself. Thanos can be deployed by running it’s Docker image. When starting Thanos, the sidecar configuration should contain the Prometheus URL (usually localhost), the path to the Prometheus database (tsdb), and a config file containing the Thanos configuration:

thanos sidecar \
    --tsdb.path            /var/prometheus \         
    --prometheus.url       "http://localhost:9090" \ 
    --objstore.config-file bucket_config.yaml \      

When Thanos is up and running, queries no longer should be running on Prometheus servers but instead should run using Thanos query tools: the store API and Thanos web UI.


Monitoring, and metrics, in particular, are an integral part of almost every DevOps team in the industry, and Prometheus helps provide better service and make production and dev environments more accessible, useful, and effective. But in order to truly gain value from a monitoring system, the system itself needs to be monitored as well; plus, it has to coordinate with the application it monitors. Additionally, as application backends scale, monitoring systems need to scale to match the new requirements of the application. Having Epsagon as part of your monitoring stack helps DevOps provide visibility into any type of application, supporting any application size and architecture as well as providing real-time alerts while your application scales to endure high load. 

Read More:

Prometheus and Grafana: The Perfect Combo

Observability Takes Too Much Developer Time, So Automate It

Tips for Running Containers and Kubernetes on AWS

Deeper Visibility into ECS and Fargate Monitoring