There have been asks from partners and prospective customers regarding the feasibility of deploying EMCO in production. There are some areas where EMCO needs enhancements to get it closer to production. Hopefully, the community can come together and contribute to an initiative that identifies the gaps, determines the enhancements needed to fill those gaps, and delivers those enhancements across multiple EMCO releases.
To get EMCO closer to production state, two important areas that need enhancements are Observability and Resiliency.
...
Recovery from crashes/disruptions
Scenarios to validate:
- Restart each microservice, when it is processing a request
- In particular: Restart orchestrator when a DIG instantiate request is in flight
- Restart all microservices together
- Restart the node on which EMCO pods are running (assuming it is 1 node for now)
...