Skip to content

Latest commit

 

History

History
294 lines (211 loc) · 12.5 KB

monitoring.md

File metadata and controls

294 lines (211 loc) · 12.5 KB
sidebar_label sidebar_position
Monitoring and Data Visualization
6

Monitoring and Data Visualization

As a distributed file system hosting massive data storage, we need to have the ability to visualize the status changes of the entire system in terms of capacity, files, CPU load, disk IO, cache, etc. JuiceFS provides real-time status data to the outside world through a Prometheus-based API. Just add it to your own Prometheus Server to scrapes and stores time series data, and then easily visualize and monitor the JucieFS file system with tools like Grafana.

Get started

It is assumed here that you build Prometheus Server, Grafana and JuiceFS clients all running on the same host.

  • Prometheus Server: Scrapes and stores the time series data of various metrics. For installation, please refer to the official documentation.
  • Grafana: Loading and visualizing the time series data from Prometheus. For installation, please refer to the official documentation.

Ⅰ. Access to real-time data

JuiceFS outputs metrics data via an API of type Prometheus. After the file system is mounted, the live monitoring data output from the client is available by default address http://localhost:9567/metrics.

Ⅱ. Add API to Prometheus Server

Edit the configuration file of Prometheus, add a new job and point to the API address of JuiceFS, e.g.

global:
  scrape_interval: 15s
  evaluation_interval: 15s

alerting:
  alertmanagers:
    - static_configs:
        - targets:
          # - alertmanager:9093

rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

scrape_configs:
  - job_name: "prometheus"
    static_configs:
      - targets: ["localhost:9090"]

  - job_name: "juicefs"
    static_configs:
      - targets: ["localhost:9567"]

Assuming a configuration file named prometheus.yml, load this configuration to start the service:

./prometheus --config.file=prometheus.yml

Visit http://localhost:9090 to see the Prometheus interface.

Ⅲ. Visualize Prometheus data by Grafana

As shown in the figure below, create a new Data Source.

  • Name: For identification purposes, you can fill in the name of the file system.
  • URL: Data interface for Prometheus, default is http://localhost:9090

Then, create a dashboard using grafana_template.json. Visit the dashboard to see a visual graph of the file system:

Collecting monitoring metrics

There are different ways to collect monitoring metrics depending on how JuiceFS is deployed, which are described below.

Mount point

When the JuiceFS file system is mounted via the juicefs mount command, you can collect monitoring metrics via the address http://localhost:9567/metrics, or you can customize it via the --metrics option. For example:

$ juicefs mount --metrics localhost:9567 ...

You can view these monitoring metrics using the command line tool:

$ curl http://localhost:9567/metrics

In addition, the root directory of each JuiceFS file system has a hidden file called .stats, through which you can also view monitoring metrics. For example (assuming here that the path to the mount point is /jfs):

$ cat /jfs/.stats

Kubernetes

The JuiceFS CSI Driver will provide monitoring metrics on the 9567 port of the mount pod by default, or you can customize it by adding the metrics option to the mountOptions (please refer to the CSI Driver documentation for how to modify mountOptions), e.g.:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: juicefs-pv
  labels:
    juicefs-name: ten-pb-fs
spec:
  ...
  mountOptions:
    - metrics=0.0.0.0:9567

Add a crawl job to prometheus.yml to collect monitoring metrics:

scrape_configs:
  - job_name: 'juicefs'
    kubernetes_sd_configs:
    - role: pod
    relabel_configs:
    - source_labels: [__meta_kubernetes_pod_label_app_kubernetes_io_name]
      action: keep
      regex: juicefs-mount
    - source_labels: [__address__]
      action: replace
      regex: ([^:]+)(:\d+)?
      replacement: $1:9567
      target_label: __address__
    - source_labels: [__meta_kubernetes_pod_node_name]
      target_label: node
      action: replace

Here assume the Prometheus server is running inside Kubernetes cluster, if your Prometheus server is running outside Kubernetes cluster, make sure Kubernetes cluster nodes are reachable from Prometheus server, refer to this issue to add the api_server and tls_config client auth to the above configuration like this:

scrape_configs:
  - job_name: 'juicefs'
    kubernetes_sd_configs:
    - api_server: <Kubernetes API Server>
      role: pod
      tls_config:
        ca_file: <...>
        cert_file: <...>
        key_file: <...>
        insecure_skip_verify: false
    relabel_configs:
    ...

S3 Gateway

:::note This feature needs to run JuiceFS client version 0.17.1 and above. :::

The JuiceFS S3 Gateway will provide monitoring metrics at the address http://localhost:9567/metrics by default, or you can customize it with the -metrics option. For example:

$ juicefs gateway --metrics localhost:9567 ...

If you are deploying JuiceFS S3 Gateway in Kubernetes, you can refer to the Prometheus configuration in the Kubernetes section to collect monitoring metrics (the difference is mainly in the regular expression for the label __meta_kubernetes_pod_label_app_kubernetes_io_name), e.g.:

scrape_configs:
  - job_name: 'juicefs-s3-gateway'
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      - source_labels: [__meta_kubernetes_pod_label_app_kubernetes_io_name]
        action: keep
        regex: juicefs-s3-gateway
      - source_labels: [__address__]
        action: replace
        regex: ([^:]+)(:\d+)?
        replacement: $1:9567
        target_label: __address__
      - source_labels: [__meta_kubernetes_pod_node_name]
        target_label: node
        action: replace

Collected via Prometheus Operator

Prometheus Operator enables users to quickly deploy and manage Prometheus in Kubernetes, with the help of the ServiceMonitor CRD provided by Prometheus Operator can automatically generate scrape configuration. For example (assuming that the Service of the JuiceFS S3 Gateway is deployed in the kube-system namespace):

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: juicefs-s3-gateway
spec:
  namespaceSelector:
    matchNames:
      - kube-system
  selector:
    matchLabels:
      app.kubernetes.io/name: juicefs-s3-gateway
  endpoints:
    - port: metrics

For more information about Prometheus Operator, please check official document.

Hadoop

The JuiceFS Hadoop Java SDK supports reporting monitoring metrics to Pushgateway and Graphite.

Pushgateway

Report metrics to Pushgateway:

<property>
  <name>juicefs.push-gateway</name>
  <value>host:port</value>
</property>

At the same time, the frequency of reporting metrics can be modified through the juicefs.push-interval configuration. The default is to report once every 10 seconds.

:::info According to the suggestion of Pushgateway official document, Prometheus's scrape configuration needs to set honor_labels: true.

It is important to note that the timestamp of the metrics scraped by Prometheus from Pushgateway is not the time when the JuiceFS Hadoop Java SDK reported it, but the time when it was scraped. For details, please refer to Pushgateway official document.

By default, Pushgateway will only save metrics in memory. If you need to persist to disk, you can specify the file path for saving with the --persistence.file option and the frequency of saving to the file with the --persistence.interval option (the default save time is 5 minutes). :::

:::note Each process using JuiceFS Hadoop Java SDK will have a unique metric, and Pushgateway will always remember all the collected metrics, resulting in the continuous accumulation of metrics and taking up too much memory, which will also slow down Prometheus scrapes metrics. It is recommended to clean up metrics on Pushgateway regularly.

Regularly use the following command to clean up the metrics of Pushgateway. Clearing the metrics will not affect the running JuiceFS Hadoop Java SDK to continuously report data. Note that the --web.enable-admin-api option must be specified when Pushgateway is started, and the following command will clear all monitoring metrics in Pushgateway.

$ curl -X PUT http://host:9091/api/v1/admin/wipe

:::

For more information about Pushgateway, please check official document.

Graphite

Report metrics to Graphite:

<property>
  <name>juicefs.push-graphite</name>
  <value>host:port</value>
</property>

At the same time, the frequency of reporting metrics can be modified through the juicefs.push-interval configuration. The default is to report once every 10 seconds.

For all configurations supported by JuiceFS Hadoop Java SDK, please refer to documentation.

Use Consul as registration center

:::note This feature needs to run JuiceFS client version 1.0.0 and above. :::

JuiceFS support use Consul as registration center for metrics API. The default Consul address is 127.0.0.1:8500. You could custom the address through --consul option, e.g.:

$ juicefs mount --consul 1.2.3.4:8500 ...

When the Consul address is configured, the --metrics option does not need to be configured. JuiceFS will automatically configure metrics URL according to its own network and port conditions. If --metrics is set at the same time, it will first try to listen on the configured metrics URL.

For each instance registered to Consul, its serviceName is juicefs, and the format of serviceId is <IP>:<mount-point>, for example: 127.0.0.1:/tmp/jfs.

The meta of each instance contains two aspects: hostname and mountpoint. When mountpoint is s3gateway, which means that the instance is an S3 gateway.

Visualize monitoring metrics

Grafana dashboard template

JuiceFS provides some dashboard templates for Grafana, which can be imported to show the collected metrics in Prometheus. The dashboard templates currently available are:

Name Description
grafana_template.json For show metrics collected from mount point, S3 gateway (non-Kubernetes deployment) and Hadoop Java SDK
grafana_template_k8s.json For show metrics collected from Kubernetes CSI Driver and S3 gateway (Kubernetes deployment)

A sample Grafana dashboard looks like this:

JuiceFS Grafana dashboard

Monitoring metrics reference

Please refer to the "JuiceFS Metrics" document.