📣

TiDB Cloud Serverless is now
TiDB Cloud Starter
! Same experience, new name.
Try it out →

Integrate TiDB Cloud with Prometheus and Grafana

TiDB Cloud provides a Prometheus API endpoint. If you have a Prometheus service, you can monitor key metrics of TiDB Cloud from the endpoint easily.

This document describes how to configure your Prometheus service to read key metrics from the TiDB Cloud endpoint and how to view the metrics using Grafana.

Prometheus integration version

TiDB Cloud has supported the project-level Prometheus integration (Beta) since March 15, 2022. Starting from October 21, 2025, TiDB Cloud introduces the cluster-level Prometheus integration (Preview).

  • Cluster-level Prometheus integration (Preview): if no legacy project-level Prometheus integration remains undeleted within your organization by October 21, 2025, TiDB Cloud provides the cluster-level Prometheus integration (Preview) for your organization to experience the latest enhancements.

  • Legacy project-level Prometheus integration (Beta): if at least one legacy project-level Prometheus integration remains undeleted within your organization by October 21, 2025, TiDB Cloud retains both existing and new integrations at the project level for your organization to avoid affecting current dashboards.

Prerequisites

  • To integrate TiDB Cloud with Prometheus, you must have a self-hosted or managed Prometheus service.

  • To set up third-party metrics integration for TiDB Cloud, you must have the Organization Owner or Project Owner access in TiDB Cloud. To view the integration page, you need at least the Project Viewer role to access the target clusters under the project in TiDB Cloud.

Limitation

  • Prometheus and Grafana integrations now are only available for TiDB Cloud Dedicated clusters.

  • Prometheus and Grafana integrations are not available when the cluster status is CREATING, RESTORING, PAUSED, or RESUMING.

Steps

Step 1. Get a scrape_config file for Prometheus

Before configuring your Prometheus service to read metrics of TiDB Cloud, you need to generate a scrape_config YAML file in TiDB Cloud first. The scrape_config file contains a unique bearer token that allows the Prometheus service to monitor your target clusters.

Depending on your Prometheus integration version, the steps to get the scrape_config file for Prometheus and access the integration page are different.

    1. In the TiDB Cloud console, navigate to the Clusters page of your project, and then click the name of your target cluster to go to its overview page.
    2. In the left navigation pane, click Settings > Integrations.
    3. On the Integrations page, click Integration to Prometheus(Preview).
    4. Click Add File to generate and show the scrape_config file for the current cluster.
    5. Make a copy of the scrape_config file content for later use.
    1. In the TiDB Cloud console, switch to your target project using the combo box in the upper-left corner.
    2. In the left navigation pane, click Project Settings > Integrations.
    3. On the Integrations page, click Integration to Prometheus (BETA).
    4. Click Add File to generate and show the scrape_config file for the current project.
    5. Make a copy of the scrape_config file content for later use.

    Step 2. Integrate with Prometheus

    1. In the monitoring directory specified by your Prometheus service, locate the Prometheus configuration file.

      For example, /etc/prometheus/prometheus.yml.

    2. In the Prometheus configuration file, locate the scrape_configs section, and then copy the scrape_config file content obtained from TiDB Cloud to the section.

    3. In your Prometheus service, check Status > Targets to confirm that the new scrape_config file has been read. If not, you might need to restart the Prometheus service.

    Step 3. Use Grafana GUI dashboards to visualize the metrics

    After your Prometheus service is reading metrics from TiDB Cloud, you can use Grafana GUI dashboards to visualize the metrics as follows:

    1. Depending on your Prometheus integration version, the link to download the Grafana dashboard JSON of TiDB Cloud for Prometheus is different.

      • For cluster-level Prometheus integration (Preview), download the Grafana dashboard JSON file here.
      • For legacy project-level Prometheus integration (Beta), download the Grafana dashboard JSON file here.
    2. Import this JSON to your own Grafana GUI to visualize the metrics.

    3. (Optional) Customize the dashboard as needed by adding or removing panels, changing data sources, and modifying display options.

    For more information about how to use Grafana, see Grafana documentation.

    Best practice of rotating scrape_config

    To improve data security, it is a general best practice to periodically rotate scrape_config file bearer tokens.

    1. Follow Step 1 to create a new scrape_config file for Prometheus.
    2. Add the content of the new file to your Prometheus configuration file.
    3. Once you have confirmed that your Prometheus service is still able to read from TiDB Cloud, remove the content of the old scrape_config file from your Prometheus configuration file.
    4. On the Integrations page of your project or cluster, delete the corresponding old scrape_config file to block anyone else from using it to read from the TiDB Cloud Prometheus endpoint.

    Metrics available to Prometheus

    Prometheus tracks the following metric data for your TiDB clusters.

    Metric nameMetric typeLabelsDescription
    tidbcloud_db_queries_totalcountsql_type: Select\|Insert\|...
    cluster_name: <cluster name>
    instance: tidb-0\|tidb-1…
    component: tidb
    The total number of statements executed
    tidbcloud_db_failed_queries_totalcounttype: planner:xxx\|executor:2345\|...
    cluster_name: <cluster name>
    instance: tidb-0\|tidb-1…
    component: tidb
    The total number of execution errors
    tidbcloud_db_connectionsgaugecluster_name: <cluster name>
    instance: tidb-0\|tidb-1…
    component: tidb
    Current number of connections in your TiDB server
    tidbcloud_db_query_duration_secondshistogramsql_type: Select\|Insert\|...
    cluster_name: <cluster name>
    instance: tidb-0\|tidb-1…
    component: tidb
    The duration histogram of statements
    tidbcloud_changefeed_latencygaugechangefeed_idThe data replication latency between the upstream and the downstream of a changefeed
    tidbcloud_changefeed_checkpoint_tsgaugechangefeed_idThe checkpoint timestamp of a changefeed, representing the largest TSO (Timestamp Oracle) successfully written to the downstream
    tidbcloud_changefeed_replica_rowsgaugechangefeed_idThe number of replicated rows that a changefeed writes to the downstream per second
    tidbcloud_node_storage_used_bytesgaugecluster_name: <cluster name>
    instance: tikv-0\|tikv-1…\|tiflash-0\|tiflash-1…
    component: tikv\|tiflash
    The disk usage bytes of TiKV/TiFlash nodes
    tidbcloud_node_storage_capacity_bytesgaugecluster_name: <cluster name>
    instance: tikv-0\|tikv-1…\|tiflash-0\|tiflash-1…
    component: tikv\|tiflash
    The disk capacity bytes of TiKV/TiFlash nodes
    tidbcloud_node_cpu_seconds_totalcountcluster_name: <cluster name>
    instance: tidb-0\|tidb-1…\|tikv-0…\|tiflash-0…
    component: tidb\|tikv\|tiflash
    The CPU usage of TiDB/TiKV/TiFlash nodes
    tidbcloud_node_cpu_capacity_coresgaugecluster_name: <cluster name>
    instance: tidb-0\|tidb-1…\|tikv-0…\|tiflash-0…
    component: tidb\|tikv\|tiflash
    The CPU limit cores of TiDB/TiKV/TiFlash nodes
    tidbcloud_node_memory_used_bytesgaugecluster_name: <cluster name>
    instance: tidb-0\|tidb-1…\|tikv-0…\|tiflash-0…
    component: tidb\|tikv\|tiflash
    The used memory bytes of TiDB/TiKV/TiFlash nodes
    tidbcloud_node_memory_capacity_bytesgaugecluster_name: <cluster name>
    instance: tidb-0\|tidb-1…\|tikv-0…\|tiflash-0…
    component: tidb\|tikv\|tiflash
    The memory capacity bytes of TiDB/TiKV/TiFlash nodes
    tidbcloud_node_storage_available_bytesgaugeinstance: tidb-0\|tidb-1\|...
    component: tikv\|tiflash
    cluster_name: <cluster name>
    The available disk space in bytes for TiKV/TiFlash nodes
    tidbcloud_disk_read_latencyhistograminstance: tidb-0\|tidb-1\|...
    component: tikv\|tiflash
    cluster_name: <cluster name>
    device: nvme.*\|dm.*
    The read latency in seconds per storage device
    tidbcloud_disk_write_latencyhistograminstance: tidb-0\|tidb-1\|...
    component: tikv\|tiflash
    cluster_name: <cluster name>
    device: nvme.*\|dm.*
    The write latency in seconds per storage device
    tidbcloud_kv_request_durationhistograminstance: tidb-0\|tidb-1\|...
    component: tikv
    cluster_name: <cluster name>
    type: BatchGet\|Commit\|Prewrite\|...
    The duration in seconds of TiKV requests by type
    tidbcloud_component_uptimehistograminstance: tidb-0\|tidb-1\|...
    component: tidb\|tikv\|tiflash
    cluster_name: <cluster name>
    The uptime in seconds of TiDB components
    tidbcloud_ticdc_owner_resolved_ts_laggaugechangefeed_id: <changefeed-id>
    cluster_name: <cluster name>
    The resolved timestamp lag in seconds for changefeed owner
    tidbcloud_changefeed_statusgaugechangefeed_id: <changefeed-id>
    cluster_name: <cluster name>
    Changefeed status:
    -1: Unknown
    0: Normal
    1: Warning
    2: Failed
    3: Stopped
    4: Finished
    6: Warning
    7: Other
    tidbcloud_resource_manager_resource_unit_read_request_unitgaugecluster_name: <cluster name>
    resource_group: <group-name>
    The read request units consumed by Resource Manager
    tidbcloud_resource_manager_resource_unit_write_request_unitgaugecluster_name: <cluster name>
    resource_group: <group-name>
    The write request units consumed by Resource Manager

    For cluster-level Prometheus integration, the following additional metrics are also available:

    Metric nameMetric typeLabelsDescription
    tidbcloud_dm_task_statusgaugeinstance: instance
    task: task
    cluster_name: <cluster name>
    Task state of Data Migration:
    0: Invalid
    1: New
    2: Running
    3: Paused
    4: Stopped
    5: Finished
    15: Error
    tidbcloud_dm_syncer_replication_lag_bucketgaugeinstance: instance
    cluster_name: <cluster name>
    Replicate lag (bucket) of Data Migration.
    tidbcloud_dm_syncer_replication_lag_gaugegaugeinstance: instance
    task: task
    cluster_name: <cluster name>
    Replicate lag (gauge) of Data Migration.
    tidbcloud_dm_relay_read_error_countgaugeinstance: instance
    cluster_name: <cluster name>
    Fail to read binlog from master.

    FAQ

    • Why does the same metric have different values on Grafana and the TiDB Cloud console at the same time?

      The aggregation calculation logic is different between Grafana and TiDB Cloud, so the displayed aggregated values might differ. You can adjust the mini step configuration in Grafana to get more fine-grained metric values.

    Was this page helpful?