Reducing high cardinality in Prometheus
As it stands, we have 4 prometheis running in production that requires a significant amount of resources. This is leading to creation and management of high spec instances running in the cloud.
1. Identifying metrics with high cardinality
First goal is to identify what metrics are causing signficant strain on prometheus in terms of memory usage. These metrics are often associated with high cardinality (many unique dimensions) leading to prometheus storing a great amount of time series in the database.
a) Prometheus TSDB API: Prometheus exposes an endpoint that displays cardinality statistics about the current head block. You’re able to gauge metrics such as highest time series count, unique label pairs and so forth.
Endpoint: <prometheus-url>/api/v1/tsdbstatus
After identifying labels with high value count, we can utilise prometheus api to view live values to get a glimpse of what data is being exposed.
b) Upgrading docker image
We were running prometheus on tag v2.40.3 and the latest was v2.46.0 which meant we were only 6 minor…