How to Visualize Prometheus Histogram Percentiles in Kibana (Metricbeat to Elasticsearch)

I'm using the Prometheus Metricbeat module to scrape /metrics endpoints from a Go service running in EKS, and ship the metrics to Elasticsearch. Everything works fine and the data is available in Kibana.

I'm particularly interested in visualizing response time percentiles based on histogram data emitted by the application. The histogram metrics are stored in Elasticsearch as multiple documents, one per bucket, with a le (less than or equal to) label. For example:

json

CopyEdit

{
  "_id": "dOcFZocBqL_gAwhbb8g5",
  "prometheus": {
    "labels": {
      "job": "prometheus",
      "api_latency_status": "Success",
      "api_latency_orgID": "1670749265",
      "le": "500",
      "instance": "<masked>:15020"
    },
    "metrics": {
      "api_latency_response_time_histogram_bucket": 14
    }
  }
}

json

CopyEdit

{
  "_id": "eecFZocBqL_gAwhbb8g6",
  "prometheus": {
    "labels": {
      "job": "prometheus",
      "api_latency_status": "Success",
      "api_latency_orgID": "1670749265",
      "le": "1000",
      "instance": "<masked>:15020"
    },
    "metrics": {
      "api_latency_response_time_histogram_bucket": 34
    }
  }
}

This means 14 events had response times ≤ 500ms, and 34 events had response times ≤ 1000ms—so 20 events fell between 500ms and 1000ms.

I want to know how to visualize percentiles like p50, p90, and p99 from this histogram data in Kibana (Lens or TSVB). Since the histogram is already bucketed by le, is there a way to compute percentiles using this format directly in Elasticsearch

Any advice or guidance would be appreciated.

Let’s see :rofl:

You cannot really calculate the specific percentiles accurately from a set of other effectively random percentiles, and it appears that’s all you’ve got. Any accuracy (or lack thereof) is dependent on luck really, your actual latencies vs the le values might be helpfully distributed, but you simply can’t know.

To use your example:

IF this is all you had, then:

14 events might have had a latency of say 5ms, and the other 20 had 505ms.

OR, the 14 had 495 and the other 20 had 505ms.

OR, the 14 had 495 and the other 20 had 995ms.

OR, the 14 had 495 and the other 20 had 10x 505 and 10x 995ms

Obviously the more le values you have helps a bit, and if the total counts are huge that helps too (laws of large numbers), but still IMO you have to make too many assumptions/approximations /interpolations then I’d not put much trust in the generated values for p50, p90, p99. I certainly wouldn’t accept this for say validating some SLA compliance (though I accept most probably would).

In your shoes I’d probably just present the data I’ve got , or ask app team to build support for proper percentiles in the reporting, or just send all the latencies to elasticsearch and it’ll do it for you (I think Prometheus can do this too if fed the raw data)