Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Overview

The purpose of this page at this time is to capture requirements related to observability of the EMCO services (https://gitlab.com/groups/project-emco/-/epics/7).

...

Also, keep in mind this cautionary note from the Prometheus project

CAUTION: Remember that every unique combination of key-value label pairs represents a new time series, which can dramatically increase the amount of data stored. Do not use labels to store dimensions with high cardinality (many different label values), such as user IDs, email addresses, or other unbounded sets of values.

Unbounded sets of values in the EMCO APIs would include values such as project names, intent names, etc.

...

This section contains some of the considerations of the guidelines above applied to the orchestrator and rsync servicesservice.

Actions may be either HTTP or gRPC The actions of a service can be identified from the gRPC requests and HTTP lifecycle requests:

ServiceAction
orchestrator

approve

instantiate
migrate
rollback
stop
terminate
update
StatusRegister
StatusDeregister
rsyncInstallAppUninstallAppReadAppContextAlertUnsubscribeUpdateAppRollbackApp

The requests, errors, and latency can be modeled after Istio's istio_requests_total and istio_request_duration_milliseconds, with an additional action name label.

The resources of a service can be identified from the HTTP resources.  The initial labels can be the URL parameters.

ServiceResourceLabels
orchestratorcontrollername

projectname

compositeAppversion, name, project

appname, composite_app_version, composite_app, project

dependencyname, app, composite_app_version, composite_app, project

compositeProfilename, composite_app_version, composite_app, project

appProfilename, composite_profile, composite_app_version, composite_app, project

deploymentIntentGroupname, composite_app_version, composite_app, project

genericPlacementIntentname, deployment_intent_group, composite_app_version, composite_app, project

genericAppPlacementIntentname, generic_placement_intent, deployment_intent_group, composite_app_version, composite_app, project

groupIntentname, deployment_intent_group, composite_app_version, composite_app_name, project

The metrics for these resources should capture the state of the resource, i.e. metrics for creation, deletion, etc. (emco_controller_creation_timestamp, emco_controller_deletion_timestamp, etc.) as described in the guidelines. This approach is suggested as it is unclear how to apply metrics capturing resource utilization to these resources.

The status of a deployment intent group deserves special consideration. The initial idea would be to add metrics describing the contents of the status. This would enable alerting on failed resources for example.

MetricLabels
deployment_intent_group_resourcename, cluster, cluster_provider, app, deployment_intent_group, composite_profile, composite_app_version, composite_app, project

It's not clear to me yet whether the rsyncStatus value should be part of the metric name (deployment_intent_group_resource_applied) or a label. Following the kube-state-metrics model would make it part of the metric name. Further complicating the question is the readyStatus field of the cluster.

Tracing

Istio provides a starting point for tracing by creating a trace for each request in the sidecars.  But this is insufficient as it does not include the outgoing requests made during an inbound request.  What we'd like to see is a complete trace of, for example, an instantiate request to the orchestrator that includes the requests made to any controllers, etc.

...