Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Overview

The purpose of this page at this time is to capture requirements related to observability of the EMCO services (https://gitlab.com/groups/project-emco/-/epics/7).

...

This bucket includes EMCO specific information such as number of projects, errors and latency of deployment intent group instantiation, etc. Also consider any cache or threadpool metrics. Looking for feedback here on any general metrics of interest to EMCO operators.

Preliminary guidelines:

  • Distinguish between resources and actions. 
  • Action metrics will record requests, errors, and latency similar to general network requests.
  • Resource metrics will record creation, deletion, and possible modification.  
  • Metrics will be labeled with project, composite-app, deployment intent group, etc.

...

Also, keep in mind this cautionary note from the Prometheus project:

CAUTION: Remember that every unique combination of key-value label pairs represents a new time series, which can dramatically increase the amount of data stored. Do not use labels to store dimensions with high cardinality (many different label values), such as user IDs, email addresses, or other unbounded sets of values.

Unbounded sets of values in the EMCO APIs would include values such as project names, intent names, etcHowever note that well-known projects such as Istio and kube-state-metrics appear to disregard this, so further investigation may be needed on the motivations behind this note.

Preliminary metrics

This section contains some of the considerations of the guidelines above applied to the orchestrator service.

...

The status of a deployment intent group deserves special consideration. The initial idea would be to add metrics describing the contents of the statussuggested approach is to support the labels necessary to execute equivalent queries as shown in EMCO Status Queries. This would enable alerting on failed resources for examplethe various states of the resources composing a deployment intent group.

MetricTypeDescriptionLabels
emco_deployment_intent_group_resourceGAUGE0 or 1project
app
composite_app_version
composite_profile
name
deployed_status
ready_status
app
cluster_provider
cluster
connectivity
resource_gvk
resource
resource_deployed_status
resource_ready_status

The deployment intent group shown in Example query - status=deployed would create the following metrics:

emco_deployment_intent_group_resource{project="testvfw",composite_app="compositevfw",composite_app_version="v1",composite_profile="vfw_composite-profile",name="vfw_deployment_intent_group",deployed_status="instantiated",ready_status="ready",app="firewall",cluster_provider="vfw-cluster-provider",cluster="edge01",connectivity="available",resource_gvk="ConfigMap.v1",resource="firewall-scripts-configmap",resource_deployed_status="applied",resource_ready_status="ready"}
emco_deployment_intent_group_resource{project="testvfw",composite_app="compositevfw",composite_app_version="v1",composite_profile="vfw_composite-profile",name="vfw_deployment_intent_group",deployed_status="instantiated",ready_status="ready",app="firewall",cluster_provider="vfw-cluster-provider",cluster="edge01",connectivity="available",resource_gvk="Deployment.v1.apps",resource="fw0-firewall",resource_deployed_status="applied",resource_ready_status="ready"}
emco_deployment_intent_group_resource{project="testvfw",composite_app="compositevfw",composite_app_version="v1",composite_profile="vfw_composite-profile",name="vfw_deployment_intent_group",deployed_status="instantiated",ready_status="ready",app="firewall",cluster_provider="vfw-cluster-provider",cluster="edge02",connectivity="available",resource_gvk="Config.v1",resource="firewall-scripts-configmap",resource_deployed_status="applied",resource_ready_status="ready"}
emco_deployment_intent_group_resource{project="testvfw",composite_app="compositevfw",composite_app_version="v1",composite_profile="vfw_composite-profile",name="vfw_deployment_intent_group",deployed_status="instantiated",ready_status="ready",app="firewall",cluster_provider="vfw-cluster-provider",cluster="edge02",connectivity="available",resource_gvk="Deployment.v1.apps",resource="fw0-firewall",resource_deployed_status="applied",resource_ready_status="ready"}
emco_deployment_intent_group_resource{project="testvfw",composite_app="compositevfw",composite_app_version="v1",composite_profile="vfw_composite-profile",name="vfw_deployment_intent_group",deployed_status="instantiated",ready_status="ready",app="packetgen",cluster_provider="vfw-cluster-provider",cluster="edge01",connectivity="available",resource_gvk="Deployment.v1.apps",resource="fw0-packetgen",resource_deployed_status="applied",resource_ready_status="ready"}
emco_deployment_intent_group_resource{project

...

="testvfw",composite_app="compositevfw",composite_app_version="v1",composite_profile="vfw_composite-profile",name="vfw_deployment_intent_group",deployed_status="instantiated",ready_status="ready",app="packetgen",cluster_provider="vfw-cluster-provider",cluster="edge01",connectivity="available",resource_gvk="ConfigMap.v1.apps",resource="packetgen-scripts-configmap",resource_deployed_status="applied",resource_ready_status="ready"}
emco_deployment_intent_group_resource{project="testvfw",composite_app="compositevfw",composite_app_version="v1",composite_profile="vfw_composite-profile",name="vfw_deployment_intent_group",deployed_status="instantiated",ready_status="ready",app="packetgen",cluster_provider="vfw-cluster-provider",cluster="edge01",connectivity="available",resource_gvk="Service.v1.apps",resource="packetgen-service",resource_deployed_status="applied",resource_ready_status="ready"}
...

Some example queries:

DescriptionPromQL
deployedCountscount(emco_deployment_intent_group_resource{project="testvfw",composite_app="compositevfw",composite_app_version="v1",composite_profile="vfw_composite-profile",name="vfw_deployment_intent_group",resource_deployed_status="applied"})
readyCountscount(emco_deployment_intent_group_resource{project="testvfw",composite_app="compositevfw",composite_app_version="v1",composite_profile="vfw_composite-profile",name="vfw_deployment_intent_group",resource_ready_status="ready"})
count(emco_deployment_intent_group_resource{project="testvfw",composite_app="compositevfw",composite_app_version="v1",composite_profile="vfw_composite-profile",name="vfw_deployment_intent_group",resource_ready_status="notready"})
appscount(emco_deployment_intent_group_resource{project="testvfw",composite_app="compositevfw",composite_app_version="v1",composite_profile="vfw_composite-profile",name="vfw_deployment_intent_group"}) by (app)
clusters filtered by the sink and firewall appscount(emco_deployment_intent_group_resource{project="testvfw",composite_app="compositevfw",composite_app_version="v1",composite_profile="vfw_composite-profile",name="vfw_deployment_intent_group",app="sink"} or emco_deployment_intent_group_resource{project="testvfw",composite_app="compositevfw",composite_app_version="v1",composite_profile="vfw_composite-profile",name="vfw_deployment_intent_group",app="firewall"}) by (cluster_provider,cluster)

Tracing

Istio provides a starting point for tracing by creating a trace for each request in the sidecars.  But this is insufficient as it does not include the outgoing requests made during an inbound request.  What we'd like to see is a complete trace of, for example, an instantiate request to the orchestrator that includes the requests made to any controllers, etc.

...