Overview
The purpose of this page at this time is to capture requirements related to observability of the EMCO services (https://gitlab.com/groups/project-emco/-/epics/7).
...
Istio metrics can be customized to include other attributes from Envoy such as subject field of peer certificate. https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/advanced/attributes
Example PromQL
Service | Type | PromQL | Notes |
---|---|---|---|
HTTP/gRPC* *The request_protocol label can be used to distinguish among HTTP and gRPC. | Queries | sum(irate(istio_requests_total{reporter="destination",destination_workload=~"services-orchestrator"}[5m])) | inbound |
sum(irate(istio_requests_total{reporter="source",source_workload="services-orchestrator"}[5m])) by (destination_workload) | outbound | ||
Errors | sum(irate(istio_requests_total{reporter="destination",destination_workload=~"services-orchestrator",response_code!~"5.*"}[5m])) / sum(irate(istio_requests_total{reporter="destination",destination_workload=~"services-orchestrator"}[5m])) | inbound | |
sum(irate(istio_requests_total{reporter="source",source_workload=~"services-orchestrator",response_code!~"5.*"}[5m])) by (destination_workload) / sum(irate(istio_requests_total{reporter="source",source_workload=~"services-orchestrator"}[5m])) by (destination_workload) | outbound | ||
Latency | histogram_quantile(0.90, sum(irate(istio_request_duration_milliseconds_bucket{reporter="destination",destination_workload="services-orchestrator"}[1m])) by (le)) / 1000 | P90 | |
Saturation |
Queries, errors, and latencies of resources external to process (network, disk, IPC, etc.)
...
Also, keep in mind this cautionary note from the Prometheus project:
CAUTION: Remember that every unique combination of key-value label pairs represents a new time series, which can dramatically increase the amount of data stored. Do not use labels to store dimensions with high cardinality (many different label values), such as user IDs, email addresses, or other unbounded sets of values.
However note that well-known projects such as Istio and kube-state-metrics appear to disregard this, so further investigation may be needed on the motivations behind this note.
...
The resources of a service can be identified from the HTTP resources. The initial labels can be the URL parameters.
Service | Resource | Labels |
---|---|---|
orchestrator | controller | name |
project | name | |
compositeApp | version, name, project | |
app | name, composite_app_version, composite_app, project | |
dependency | name, app, composite_app_version, composite_app, project | |
compositeProfile | name, composite_app_version, composite_app, project | |
appProfile | name, composite_profile, composite_app_version, composite_app, project | |
deploymentIntentGroup | name, composite_app_version, composite_app, project | |
genericPlacementIntent | name, deployment_intent_group, composite_app_version, composite_app, project | |
genericAppPlacementIntent | name, generic_placement_intent, deployment_intent_group, composite_app_version, composite_app, project | |
groupIntent | name, deployment_intent_group, composite_app_version, composite_app_name, project | |
dcm | emco_logical_cloud_resource | project, name, namespace, status |
clm | emco_cluster_provider_resource | name |
emco_cluster_resource | name, clusterprovider | |
ncm | emco_cluster_network_resource | clusterprovider, cluster, name, cnitype |
emco_cluster_provider_network_resource | clusterprovider, cluster, name, cnitype, nettype, vlanid, providerinterfacename, logicalinterfacename, vlannodeselector | |
dtc | emco_dig_traffic_group_intent_resource | name, project, composite_app, composite_app_version, dig |
emco_dig_inbound_intent_resource | name, project, composite_app, composite_app_version, dig, traffic_group_intent, spec_app, app_label, serviceName, externalName, port, protocol, externalSupport, serviceMesh, sidecarProxy, tlsType | |
emco_dig_inbound_intent_client_resource | name project, composite_app, composite_app_version, dig, traffic_group_intent, inbound_intent, spec_app, app_label, serviceName | |
emco_dig_inbound_intent_client_access_point_resource | name, project, composite_app, composite_app_version, dig, traffic_group_intent, inbound_intent, client_name, action | |
ovnaction | emco_network_controller_intent_resource | name, project, composite_app, composite_app_version, dig |
emco_workload_intent_resource | name, project, composite_app, composite_app_version, dig, network_controller_intent, app_label, workload_resource, type | |
emco_workload_interface_intent_resource | name, project, composite_app, composite_app_version, dig, network_controller_intent, workload_intent interface, network_name, default_gateway, ip_address, mac_address |
The metrics for these resources should capture the state of the resource, i.e. metrics for creation, deletion, etc. (emco_controller_creation_timestamp, emco_controller_deletion_timestamp, etc.) as described in the guidelines. This approach is suggested as it is unclear how to apply metrics capturing resource utilization to these resources.
...