Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Overview

The purpose of this page at this time is to capture requirements related to observability of the EMCO services (https://gitlab.com/groups/project-emco/-/epics/7).

...

Istio metrics can be customized to include other attributes from Envoy such as subject field of peer certificate. https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/advanced/attributes

Example PromQL

ServiceTypePromQLNotes

HTTP/gRPC*

*The request_protocol label can be used to distinguish among HTTP and gRPC.

Queriessum(irate(istio_requests_total{reporter="destination",destination_workload=~"services-orchestrator"}[5m]))inbound
sum(irate(istio_requests_total{reporter="source",source_workload="services-orchestrator"}[5m])) by (destination_workload)outbound
Errorssum(irate(istio_requests_total{reporter="destination",destination_workload=~"services-orchestrator",response_code!~"5.*"}[5m])) / sum(irate(istio_requests_total{reporter="destination",destination_workload=~"services-orchestrator"}[5m]))inbound
sum(irate(istio_requests_total{reporter="source",source_workload=~"services-orchestrator",response_code!~"5.*"}[5m])) by (destination_workload) / sum(irate(istio_requests_total{reporter="source",source_workload=~"services-orchestrator"}[5m])) by (destination_workload)outbound
Latencyhistogram_quantile(0.90, sum(irate(istio_request_duration_milliseconds_bucket{reporter="destination",destination_workload="services-orchestrator"}[1m])) by (le)) / 1000P90
Saturation

Queries, errors, and latencies of resources external to process (network, disk, IPC, etc.)

...

Also, keep in mind this cautionary note from the Prometheus project:

CAUTION: Remember that every unique combination of key-value label pairs represents a new time series, which can dramatically increase the amount of data stored. Do not use labels to store dimensions with high cardinality (many different label values), such as user IDs, email addresses, or other unbounded sets of values.

However note that well-known projects such as Istio and kube-state-metrics appear to disregard this, so further investigation may be needed on the motivations behind this note.

...

The resources of a service can be identified from the HTTP resources.  The initial labels can be the URL parameters.

ServiceResourceLabels
orchestrator









controllername


projectname
compositeAppversion, name, project
appname, composite_app_version, composite_app, project
dependencyname, app, composite_app_version, composite_app, project
compositeProfilename, composite_app_version, composite_app, project
appProfilename, composite_profile, composite_app_version, composite_app, project
deploymentIntentGroupname, composite_app_version, composite_app, project
genericPlacementIntentname, deployment_intent_group, composite_app_version, composite_app, project
genericAppPlacementIntentname, generic_placement_intent, deployment_intent_group, composite_app_version, composite_app, project
groupIntentname, deployment_intent_group, composite_app_version, composite_app_name, project
dcm

emco_logical_cloud_resource

project, name, namespace, status
clm


emco_cluster_provider_resource

name

emco_cluster_resource

name, clusterprovider
ncm


emco_cluster_network_resource

clusterprovider, cluster, name, cnitype

emco_cluster_provider_network_resource

clusterprovider, cluster, name, cnitype, nettype, vlanid, providerinterfacename, logicalinterfacename, vlannodeselector
dtc

emco_dig_traffic_group_intent_resource

name, project, composite_app, composite_app_version, dig




emco_dig_inbound_intent_resource

name, project, composite_app, composite_app_version, dig, traffic_group_intent,

spec_app, app_label, serviceName, externalName, port, protocol, externalSupport, serviceMesh, sidecarProxy, tlsType

emco_dig_inbound_intent_client_resource

name project, composite_app, composite_app_version, dig, traffic_group_intent, inbound_intent, spec_app, app_label, serviceName

emco_dig_inbound_intent_client_access_point_resource

name, project, composite_app, composite_app_version, dig, traffic_group_intent, inbound_intent, client_name, action

ovnaction

emco_network_controller_intent_resource

name, project, composite_app, composite_app_version, dig

emco_workload_intent_resource

name, project, composite_app, composite_app_version, dig, network_controller_intent, app_label, workload_resource, type

emco_workload_interface_intent_resource

name, project, composite_app, composite_app_version, dig, network_controller_intent, workload_intent

interface, network_name, default_gateway, ip_address, mac_address

The metrics for these resources should capture the state of the resource, i.e. metrics for creation, deletion, etc. (emco_controller_creation_timestamp, emco_controller_deletion_timestamp, etc.) as described in the guidelines. This approach is suggested as it is unclear how to apply metrics capturing resource utilization to these resources.

...