ARTIFACTORY: HA Metrics: Daily Aggregation and Counter Reset Handling with Prometheus and Grafana

ARTIFACTORY: HA Metrics: Daily Aggregation and Counter Reset Handling with Prometheus and Grafana

Products
Frog_Artifactory
Content Type
Administration_Platform
AuthorFullName__c
Jeremy Leopold
articleNumber
000006541
FirstPublishedDate
2025-08-04T13:44:29Z
lastModifiedDate
2025-08-04
VersionNumber
1
Introduction

This article provides a comprehensive guide on how to effectively aggregate daily Artifactory download and upload metrics in an HA environment using the JFrog Supported Log Analytics and Metrics solution using Prometheus and Grafana, with a specific focus on understanding and handling Prometheus counter resets.

Overview of Metric Aggregation for Daily Activity

Key performance indicators from Artifactory, such as total downloads, total bytes downloaded, number of uploads, and total bytes uploaded, are listed in the documentation and are exposed as the following OpenMetrics, as Counters:
  • jfsh_binaries_bytes_download_total
  • jfsh_binaries_bytes_upload_total
  • jfsh_binaries_download_total
  • jfsh_binaries_upload_total
It's important to note that the COUNTER is a type of metric according to the OpenMetrics protocol. Typically, these counters start from zero after a restart and accumulate, resetting upon subsequent restarts. To gain insight into activity over specific periods—like the last 15 minutes or a full day—it's essential to calculate the increase of these counters within the chosen time range.

Example Prometheus Query for Aggregation

A core component for this analysis is the use of PromQL's (Prometheus Query Language) increase() and sum() functions. For instance, to calculate the total bytes uploaded across your Artifactory instances:
sum(increase(jfsh_binaries_bytes_upload_total{namespace="$deployment"}[$__range]))

 

In this query:
  • increase(): This function computes how much a counter metric has grown over the specified time window, which is dynamically set by the Grafana dashboard's time range selector (represented by $__range).
  • sum(): This aggregates the calculated increases across all matching time series, effectively combining the metrics from multiple Artifactory pods in your deployment into a single total.
Handling Artifactory HA Metrics Collection

In an Artifactory High Availability (HA) setup, Prometheus collects metrics per instance. This means each Artifactory pod in your cluster reports its own metrics independently. Prometheus itself does not automatically aggregate these metrics across the entire cluster.
To obtain a complete and accurate view of your daily activity across all instances, it is crucial to sum the increases of the relevant metrics over the desired daily time range for each instance. The Prometheus query logic is specifically designed to perform this aggregation, ensuring you get a holistic picture of your Artifactory usage in an HA environment.

Understanding Prometheus Counter Reset Handling

A critical aspect of monitoring with Prometheus is its inherent ability to robustly handle counter resets. These resets commonly occur when an Artifactory pod restarts or a metric is reinitialized. As documented by Prometheus: "Breaks in monotonicity (such as counter resets due to target restarts) are automatically adjusted for."
When Prometheus performs a scrape and detects a sudden drop in a counter's value (which signifies a reset), that specific scrape is treated as a zero point from which subsequent increases are calculated. This intelligent handling ensures that the increase() function, and thus the sum of increases, consistently reflects the true growth of the metric over time and will not decrease due to restarts.

Potential Implications of Inter-Scrape Restarts

While Prometheus's counter reset handling is generally effective, it's important to be aware of a minor edge case that can lead to small, practically insignificant data gaps: if an Artifactory pod restarts between two Prometheus scrapes, a tiny portion of the data generated during that precise interval might be missed.
Consider the following scenario with a 15-second scraping interval:
  • 10:00:00: Prometheus scrapes, 500 GB of data downloaded is recorded, added to the total.
  • 10:00:06: The metric internally increases to 500.125 GB (an additional 125 MB, pending capture).
  • 10:00:07: The Artifactory pod restarts, causing its internal counter to reset to 0 (the pending 125 MB from 10:00:06 to 10:00:07 is lost).
  • 10:00:14: The metric starts increasing again from 0, reaching 375 MB.
  • 10:00:15: The next Prometheus scrape occurs. Prometheus observes the drop and calculates the increase from the reset point: 375 MB is recorded as the increase for this interval.
In this example, the expected increase over the 15-second period was 500 MB, but only 375 MB is reflected due to the restart occurring between scrapes. While such minor inaccuracies can occur, they are typically negligible for most use cases, given that pod restarts are not usually a constant occurrence. For a more detailed technical explanation of how Prometheus counters work, external resources are available.
Further Resources
For reference, we've created the below example Grafana dashboard JSON which calculates total increases in the total number of downloads, bytes downloaded, uploads, and bytes uploaded over a selected time range (e.g., the last 15 minutes). This range can be adjusted to cover a full day. Below is a screenshot of its implementation:

User-added image 

{
  "annotations": {
    "list": [
      {
        "builtIn": 1,
        "datasource": {
          "type": "grafana",
          "uid": "-- Grafana --"
        },
        "enable": true,
        "hide": true,
        "iconColor": "rgba(0, 211, 255, 1)",
        "name": "Annotations & Alerts",
        "target": {
          "limit": 100,
          "matchAny": false,
          "tags": [],
          "type": "dashboard"
        },
        "type": "dashboard"
      }
    ]
  },
  "editable": true,
  "fiscalYearStartMonth": 0,
  "graphTooltip": 0,
  "id": 12202,
  "links": [],
  "liveNow": false,
  "panels": [
    {
      "datasource": {
        "type": "prometheus",
        "uid": "global-view-thanos"
      },
      "fieldConfig": {
        "defaults": {
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "red",
                "value": 80
              }
            ]
          },
          "unit": "short"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 12,
        "x": 0,
        "y": 0
      },
      "id": 4,
      "options": {
        "colorMode": "value",
        "graphMode": "area",
        "justifyMode": "auto",
        "orientation": "auto",
        "reduceOptions": {
          "calcs": [
            "last"
          ],
          "fields": "",
          "values": false
        },
        "textMode": "auto"
      },
      "pluginVersion": "9.2.4",
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "global-view-thanos"
          },
          "editorMode": "code",
          "expr": "sum(increase(jfsh_binaries_download_total{namespace=\"$deployment\"}[$__range]))",
          "legendFormat": "{{id}}",
          "range": true,
          "refId": "A"
        }
      ],
      "title": "Number of downloads",
      "type": "stat"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "global-view-thanos"
      },
      "fieldConfig": {
        "defaults": {
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "red",
                "value": 80
              }
            ]
          },
          "unit": "short"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 12,
        "x": 12,
        "y": 0
      },
      "id": 6,
      "options": {
        "colorMode": "value",
        "graphMode": "area",
        "justifyMode": "auto",
        "orientation": "auto",
        "reduceOptions": {
          "calcs": [
            "lastNotNull"
          ],
          "fields": "",
          "values": false
        },
        "textMode": "auto"
      },
      "pluginVersion": "9.2.4",
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "global-view-thanos"
          },
          "editorMode": "code",
          "expr": "sum(increase(jfsh_binaries_bytes_download_total{namespace=\"$deployment\"}[$__range]))",
          "legendFormat": "{{id}}",
          "range": true,
          "refId": "A"
        }
      ],
      "title": "Total bytes downloaded ",
      "type": "stat"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "global-view-thanos"
      },
      "fieldConfig": {
        "defaults": {
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "red",
                "value": 80
              }
            ]
          },
          "unit": "short"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 12,
        "x": 0,
        "y": 8
      },
      "id": 2,
      "options": {
        "colorMode": "value",
        "graphMode": "area",
        "justifyMode": "auto",
        "orientation": "auto",
        "reduceOptions": {
          "calcs": [
            "lastNotNull"
          ],
          "fields": "",
          "values": false
        },
        "textMode": "auto"
      },
      "pluginVersion": "9.2.4",
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "global-view-thanos"
          },
          "editorMode": "code",
          "exemplar": false,
          "expr": "sum(increase(jfsh_binaries_upload_total{namespace=\"$deployment\"}[$__range]))",
          "format": "time_series",
          "instant": false,
          "legendFormat": "{{name}}",
          "range": true,
          "refId": "A"
        }
      ],
      "title": "Number of upload",
      "type": "stat"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "global-view-thanos"
      },
      "fieldConfig": {
        "defaults": {
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "red",
                "value": 80
              }
            ]
          },
          "unit": "short"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 12,
        "x": 12,
        "y": 8
      },
      "id": 8,
      "options": {
        "colorMode": "value",
        "graphMode": "area",
        "justifyMode": "auto",
        "orientation": "auto",
        "reduceOptions": {
          "calcs": [
            "lastNotNull"
          ],
          "fields": "",
          "values": false
        },
        "textMode": "auto"
      },
      "pluginVersion": "9.2.4",
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "global-view-thanos"
          },
          "editorMode": "code",
          "expr": "sum(increase(jfsh_binaries_bytes_upload_total{namespace=\"$deployment\"}[$__range]))",
          "legendFormat": "__auto",
          "range": true,
          "refId": "A"
        }
      ],
      "title": "total bytes uploaded ",
      "type": "stat"
    }
  ],
  "refresh": false,
  "schemaVersion": 37,
  "style": "dark",
  "tags": [],
  "templating": {
    "list": [
      {
        "current": {
          "selected": false,
          "text": "GlobalView-Prometheus",
          "value": "GlobalView-Prometheus"
        },
        "hide": 0,
        "includeAll": false,
        "multi": false,
        "name": "datasource",
        "options": [],
        "query": "prometheus",
        "refresh": 1,
        "regex": "/^GlobalView-.*/",
        "skipUrlSync": false,
        "type": "datasource"
      },
      {
        "current": {
          "selected": false,
          "text": "psemea",
          "value": "psemea"
        },
        "datasource": {
          "type": "prometheus",
          "uid": "${datasource}"
        },
        "definition": "label_values(jvm_info,namespace)",
        "hide": 0,
        "includeAll": false,
        "multi": false,
        "name": "deployment",
        "options": [],
        "query": {
          "query": "label_values(jvm_info,namespace)",
          "refId": "StandardVariableQuery"
        },
        "refresh": 1,
        "regex": "/^psemea.*/",
        "skipUrlSync": false,
        "sort": 0,
        "type": "query"
      }
    ]
  },
  "time": {
    "from": "now-15m",
    "to": "now"
  },
  "timepicker": {},
  "timezone": "",
  "title": "artifactory_monitoring",
  "uid": "yaz3kRySk",
  "version": 6,
  "weekStart": ""
}