There have been asks from partners and prospective customers regarding the feasibility of deploying EMCO in production. There are some areas where EMCO needs enhancements to get it closer to production. Hopefully, the community can come together and contribute to an initiative that identifies the gaps, determines the enhancements needed to fill those gaps, and delivers those enhancements across multiple EMCO releases.
To get EMCO closer to production state, two important areas that need enhancements are Observability and Resiliency.
...
rsync can restart after a crash. Aarna, as part of EMCO backup/restore presentation, has tested blowing away the EMCO namespace (incl. EMCO pods and db), and restoring it.
Some known gaps:
- Mongo db consistency
...
- : Some microservices may make multiple db writes for a single API call. So, if the microservice crashes in the middle of that API call, we will have an inconsistent update in mongo. We need to scrub for such scenarios and fix them.
- Resource Cleanup: The orchestrator creates entries in the appcontext during DIG instantiation; it needs to cleanup any stale context on restart after a crash.
Graceful handling of cluster connectivity failure
...