Overview

The purpose of this page at this time is to capture requirements related to observability of the EMCO services (https://gitlab.com/groups/project-emco/-/epics/7).

...

Example PromQL

Service	Type	PromQL	Notes
HTTP/gRPC* *The request_protocol label can be used to distinguish among HTTP and gRPC.	Queries	sum(irate(istio_requests_total{reporter="destination",destination_workload=~"services-orchestrator"}[5m]))	inbound
	Queries	sum(irate(istio_requests_total{reporter="source",source_workload="services-orchestrator"}[5m])) by (destination_workload)	outbound
	Errors	sum(irate(istio_requests_total{reporter="destination",destination_workload=~"services-orchestrator",response_code!~"5.*"}[5m])) / sum(irate(istio_requests_total{reporter="destination",destination_workload=~"services-orchestrator"}[5m]))	inbound
	Errors	sum(irate(istio_requests_total{reporter="source",source_workload=~"services-orchestrator",response_code!~"5.*"}[5m])) by (destination_workload) / sum(irate(istio_requests_total{reporter="source",source_workload=~"services-orchestrator"}[5m])) by (destination_workload)	outbound
	Latency	histogram_quantile(0.90, sum(irate(istio_request_duration_milliseconds_bucket{reporter="destination",destination_workload="services-orchestrator"}[1m])) by (le)) / 1000	P90

In-progress requests

...

Saturation

Queries, errors, and latencies of resources external to process (network, disk, IPC, etc.)

...

The prometheus golang library provides builtin collectors for various process and golang metrics: https://pkg.go.dev/github.com/prometheus/client_golang@v1.12.2/prometheus/collectors. A list of metrics provided by cAdvisor is at https://github.com/google/cadvisor/blob/master/docs/storage/prometheus.md. Additional K8s specific metrics can be enabled with the https://github.com/kubernetes/kube-state-metrics project.

Example PromQL

Note: some of these require that kube-state-metrics is also deployed.

Pod Resource	Type	PromQL
CPU	Utilization	sum(rate(container_cpu_usage_seconds_total{namespace="emco"}[5m])) by (pod)
	Saturation	sum(rate(container_cpu_cfs_throttled_seconds_total{namespace="emco"}[5m])) by (pod)
	Errors
Memory	Utilization	sum(container_memory_working_set_bytes{namespace="emco"}) by (pod)
	Saturation	sum(container_memory_working_set_bytes{namespace="emco"}) by (pod) / sum(kube_pod_container_resource_limits{namespace="emco",resource="memory",unit="byte"}) by (pod)
	Errors
Disk	Utilization	sum(irate(container_fs_reads_bytes_total{namespace="emco"}[5m])) by (pod, device)
	Utilization	sum(irate(container_fs_writes_bytes_total{namespace="emco"}[5m])) by (pod)
	Saturation
	Errors
Network	Utilization	sum(rate(container_network_receive_bytes_total{namespace="emco"}[1m])) by (pod)
	Utilization	sum(rate(container_network_transmit_bytes_total{namespace="emco"}[1m])) by (pod)
	Saturation
	Errors	sum(container_network_receive_errors_total{namespace="emco"}) by (pod)
	Errors	sum(container_network_transmit_errors_total{namespace="emco"}) by (pod)

Internal errors and latency

...

Also, keep in mind this cautionary note from the Prometheus project

CAUTION: Remember that every unique combination of key-value label pairs represents a new time series, which can dramatically increase the amount of data stored. Do not use labels to store dimensions with high cardinality (many different label values), such as user IDs, email addresses, or other unbounded sets of values.

Unbounded sets of values in the EMCO APIs would include values such as project names, intent names, etc.

...

Version	Old Version 11	New Version 12
Changes made by	Todd Malsbary	Todd Malsbary
Saved on	Aug 22, 2022	Aug 23, 2022

Versions Compared

Key

Overview

Example PromQL

In-progress requests

Queries, errors, and latencies of resources external to process (network, disk, IPC, etc.)

Example PromQL

Internal errors and latency

Page Comparison

Versions Compared

Key

Example PromQL

In-progress requests

Queries, errors, and latencies of resources external to process (network, disk, IPC, etc.)

Example PromQL

Internal errors and latency