...
Internal errors and latency
Internal errors could be as simple as counting error logs, however this leaves out the count of attempts preventing easy calculation of success should be counted. It also desirable to measure success to calculate ratio.
Totals of info/error/warning logs
...
This bucket includes EMCO specific information such as number of projects, errors and latency of deployment intent group instantationinstantiation, etc. Also consider any cache or threadpool metrics. Looking for feedback here on any general metrics of interest to EMCO operators.
Preliminary guidelines:
- Distinguish between resources and actions.
- Action metrics will record requests, errors, and latency similar to general network requests.
- Resource metrics will record creation, deletion, and possible modification.
- Metrics will be labeled with project, composite-app, deployment intent group, etc.
Tracing
Istio provides some tracing support, but it appears rudimentary (no detailed spans for EMCO related operations).
Preliminary guidelines:
- Follow the flow of all external and internal API calls.
- Filter by caller.
Logging
Each log message must contain the timestamp and identifying information describing the resource, such as project, composite application, etc. in case of orchestration.
The priority is placed on error logs; logging other significant actions is secondary.