Skip to end of banner
Go to start of banner

Observability

Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

Overview

The purpose of this page at this time is to capture requirements related to observability of the EMCO services (https://gitlab.com/groups/project-emco/-/epics/7).

Front-ending the services with Istio provides a useful set of metrics and tracing, and adding the Prometheus library provided collectors to each service expands that with other fundamental metrics. The open question is what additional metrics and tracing will be useful to EMCO operators.

Metrics

The following items are based on Prometheus recommendations for instrumentation.

Queries, errors, and latency

Both client and server side are provided by Istio. https://istio.io/latest/docs/reference/config/metrics/

Istio metrics can be customized to include other attributes from Envoy such as subject field of peer certificate. https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/advanced/attributes

In-progress requests

These do not appear to be available with Istio, further investigation is required.

Queries, errors, and latencies of resources external to process (network, disk, IPC, etc.)

Unsure which external resources would need this coverage at this time. Note that downstream HTTP and gRPC requests are provided by Istio.

The prometheus golang library provides builtin collectors for various process and golang metrics: https://pkg.go.dev/github.com/prometheus/client_golang@v1.12.2/prometheus/collectors.

Internal errors and latency

Internal errors could be as simple as counting error logs, however this leaves out the count of attempts preventing easy calculation of success ratio.

Totals of info/error/warning logs

Unsure if this is a useful metric.

Any general statistics

This bucket includes EMCO specific information such as number of projects, errors and latency of deployment intent group instantation, etc. Also consider any cache or threadpool metrics. Looking for feedback here on any general metrics of interest to EMCO operators.

Tracing

Istio provides some tracing support, but it appears rudimentary (no detailed spans for EMCO related operations).

  • No labels