This document will be open for review till October 26. Please leave your comment inline.
1.
...
Introduction – why telecom industry needs Cloud Native PaaS
Since NFV architecture was firstly proposed as ETSI standard in 2012, telecommunication industry and telecom operators have started the cloud transformation of network, moving network functions from dedicated physical devices to standard virtualized environment. By 2021, majority of global telecom operators have built Network Cloud, which is private cloud carrying 4G/5G network functions, value-added network functions, network management systems, network orchestration systems, etc. Operator representatives include AT&T, Verizon, Telstra, BT, DT, China Mobile, China Telecom, China Unicom, NTT, etc. Take China Mobile for example, by the end of 2020, China Mobile has built Network Cloud in 8 major districts within China to carry 37 types of network functions including 5GC, IMS, EPC. The proportion of network cloud is up to 75%, which keeps increasing with the construction of edge cloud.
...
According to the above analysis, cloud native PaaS, which refers to PaaS providing cloud native capabilities such as containers, microservice, automation tools, and cloud native network function components, is necessary and worth researching in telecommunication industry. Exploring cloud native PaaS in open source community can provide reusable capabilities and reference implementation for the industry.
2.
...
PaaS Condition in Network Cloud
Although it can be predicted that cloud native PaaS can improve the flexibility of network cloud, if looking at current situation of telecommunication industry, it is not clear how to introduce PaaS into network cloud, or what PaaS capabilities are required by network. Most of the network clouds, which have been putting into use, follow NFV architecture and do not take PaaS as an independent system or visible object. These operators and vendors use virtual machines and containers directly as infrastructure, and package all required functional modules and software reliance into VNF delivery image. Figure 1 shows a simple diagram of current VNF condition that only resources are reused among VNFs while VNF related software are dedicated to each VNF.
...
- PaaS capabilities required to implement NF functions: This type of PaaS capability is necessary to achieve NF functions and logics, for example database, load balancer, protocol processing capability (PFCP, GTPU…), message bus, possible NF functional modulars, and some infrastructure related capabilities (such as hardware acceleration). User of this type of capabilities is mainly developers. These capabilities could be common among different NFs but unique for different venders/operators.
- PaaS capabilities required to manage NF functions: This type of PaaS capability helps to optimize the operation and management of network cloud, but has no direct influence on NF logics. Possible capabilities includes observability capabilities, CI/CD tools, testing tools, FCAPS management tools, etc. User of this type of capabilities is mainly developers and operation staffs. These capabilities can also be used to support NFV management systems such as NFVO, VNFM, EMS, OSS.
- PaaS capabilities to expose NF service to external customers: Representative capabilities of this type include bandwidth management capability, user identification capability, mobility management capability, UPF traffic routing function, and other edge computing network functions. User of this type of capabilities is external customers. Currently this type of capabilities is commonly carried and provided by MEC platform.
3.
...
XGVela Overview
According to previous chapters, as the PaaS related standards and research are relatively indolent, it is a good way for telecommunication industry to start with existing open-source PaaS capabilities and build reference implementations. We could explore the enhancement of existing PaaS capabilities in telecom scenarios as well as new PaaS capabilities dedicatedly used telco network cloud. This is the reason that we start XGVela project.
...
General PaaS can provide the abstract tools, services and environment required in the process of development, deployment, running, operation and management of applications and their associated services. It is the baseline platform for other XGVela PaaS categories.The user of
General PaaS is generally IT applications, which can use such capabilities through standard PaaS service APIs and PaaS management processes. capabilities are standard cloud capability which can be offered by any cloud provider, and it will have the capabilities to be shared by a number of industry specific PaaS including Telco PaaS.
In short, General PaaS refers to IT commonly-used PaaS capabilities by all industries(e.g. service mesh, API GW, LB, observability) and existing open-source PaaS implementations (e.g. Istio, envoy, Zookeeper, Grafana). XGVela takes those implementations as reference for integration instead of re-inventing the wheels.
...
- Define PaaS platform architecture, necessary functions / interfaces, processes, common software, etc. for Telecom scenarios. Content will cover general PaaS, adaptation layer and telco PaaS, of which General PaaS capabilities refers to existing implementations.
- Explore requirements of adaptation layer and telco PaaS base on telecom use cases, and implement functionality, interface, etc.
- Build reference implementation of telecom cloud native PaaS platform.
4.
...
XGVela Technical Insight
4.1 Technical Architecture
...
- Service Discovery functions shall maintain real-time microservice access info, which includes adding address of new microservice, update microservice instance address, deleting information of fault microservice, etc.
- Commonly used open -source software include CoreDNS,etcd,Zookeeper,Netflix,Nacos, among which CoreDNS, etcd, Zookeeper are popular choice.
4.2.3 Telco PaaS Requirements
As defined in Chapter 3.1, the key object of Telco PaaS is to implement functions necessary to support telco workloads and procedures in a cloud native environment. It could be used independently or work together with available General PaaS capabilities/functions.
4.2.
...
3.1 General Requirements on Telco PaaS
Like General PaaS, before discussing about detailed functional requirements, there are common requirements applicable to all Telco PaaS functions.
- Telco PaaS shall implement functions necessary to support Telco workloads and procedures in cloud native environment.
- Telco PaaS shall not duplicate the functions already available in General PaaS layer rather than interface, enhance or adapt General PaaS functions where available.
- Telco PaaS shall support Telco standards compliant information and object model.
- Telco PaaS shall support Telco standards compliant NBIs:
- NetConf for configuration management
- VES for event notification
- TM Forum OpenAPIs
- 3GPP5G Service Based Interfaces (SBI)
- NFV orchestration – Or-Vnfm-EM/Vnf???
- It is recommended Telco PaaS to use CNF packaging model and align with NFV.
The following are requirements similar to that of General PaaS (detailed description can refer to Chapter 4.2.2.1).
- Telco PaaS shall be deployed on any Kubernetes based CaaS and General PaaS distributions that might be instrumented on Cloud or bare metal.
- Telco PaaS shall complete service lifecycle management through PaaS Management.
- Telco PaaS services shall be packaged as Operator or Helm Chart, stored in Image & Package repository.
- Telco PaaS services shall generate metrics, log, alarm, event and report these data to PaaS Management.
- Telco PaaS shall support custom configuration.
4.2.3.2 Possible Functional Solution for Telco PaaS
Till now, during the development of telecom network functions and related systems, it is able to summarize the General PaaS capabilities/functions, while the type and number of Telco PaaS used is not clear. This doesn’t mean that Telco PaaS has not been used, but because there is no open-source reference implementation of Telco PaaS. Therefore, in this chapter, common Telco PaaS functions will be summarized based on the development and management experience. The number and category of Telco PaaS functions will keep increasing as more use cases are explored.
In Chapter 2, three possible types of cloud native PaaS capabilities in network cloud have been concluded, which are PaaS capability required to implement NF functions, PaaS capability required to manage NF functions, PaaS capability to expose NF service to external customers. Among the three types, PaaS capability required to manage NF functions is bypass, which has little impact on the business logic and functions of the NF, and is a good choice for initial exploration. Therefore, XGVela starts from these type of PaaS capability, and explores the management-related Telco PaaS functions.
Management-related Telco PaaS functions implements services, but not limited to, for configuration management, fault management, log management, performance management, topology management and high availability for the managed NFs. There are opportunities to generalize certain other Telco specific functions such as subscriber tracing, lawful intercept (LI), call data records (CDR), etc.
- Topology Management Service: This service models networks functions, components within each network function and associated resources as Managed Objects and makes it possible to manage the objects individually or as a group of related objects.
- Configuration Management Service: This service manages configuration of Network Functions and µServices. Configuration is described using Yang and encoded in JSON. NetConf and CLI are exposed for configuration management.
- Fault Management Service: This service implements 3GPP compliant fault and alarm management model. Provides interfaces for application µService to publish and subscribe to various events. It interfaces with the metrics management system (Prometheus) for TCA events. Exposed VES compliant NBI for notifications.
- Metrics Management Service: This service implements Prometheus at the core for metrics collection and monitoring and provides necessary correlations to 3GPP managed object model, events and clock aligned measurements.
- High Availability Service: Kubernetes does not meet fast HA requirements needed for certain Telco stateful services. HAaaS provides overlay HA capability by supporting a n:m Active-Standby model in a distributed scalable fashion.
4.2.3.3 Telco PaaS Solution contributed by Mavenir MTCIL
The first batch of Telco PaaS functions of XGVela come from the seed code contributed by Mavenir company’s product “MTCIL”, which mainly provides management telco PAAS capabilities. Related functions will be introduced below.
Figure 4-2 Architecture of Mavenir Contribution
Figure 4-2 shows the functional architecture of Mavenir contribution. In the figure, Telco PaaS functions include CMaaS (Configuration Management as a Service), TMaaS (Topology Management as a Service), FMaaS (Fault Management as a Service); And two auxiliary management capabilities, VESGW (ONAP VES Gateway) and CIM (CNF Interface Module).
CIM
CNF interface module, a most important assistant component of Telco-PaaS, provides a single integration point and API for NFs or 3rd party applications. It has the following features:
- Deployed as a sidecar to application containers.
- Implements various single node design patterns to enable loose coupling of application containers to the infrastructure.
- Interfaces with applications over REST for APIs and NATS for messaging and events.
- It is the local agent for other management Telco PaaS capabilities to manage NFs and get NF information.
CMaaS
Configuration Management as a Service manages the configuration of network functions and application, and can be treated as configuration management center for CNF (Cloud Native VNF). It has the following features:
- Exposes NetConf NBI for orchestration and management systems like ONAP to push configuration for NFs/application and related microservices.
- Yang is supported as data model of configuration.
- Translates configuration in Yang model and NetConf protocol into simple JSON/REST, and push translated configuration to NF/application containers.
- Supports two methods to update Day-2 configuration:
- One method is to deliver configuration via K8S rolling update. This method requires NF/application pod to restart and re-read the ConfigMap, and is mostly adopted by IT applications and stateless applications.
- The other method is direct API calls to NF/application container via etcd and CIM per application need. This method supports NF/application pod to update configuration without restart. As the structure and configuration of telecom network functions are complex, it is generally do configuration after the deployment of network functions, and the network functions should maintain stable running state. Therefore, this method is more suitable for telecom configuration.
Figure 4-3 CMaaS Diagram
Figure 4-3 shows the working principle of CMaaS. Firstly, CMaaS pod will receive the configuration sent by management system (such as ONAP) through NBI, which is formatted in Yang model and transmitted through NETCONF protocol; Then the CMaaS module will parse the configuration and store it in etcd in the form of key-value pair, where CMaaS will maintain the version of the configuration; After that, CMaaS will notify the CIM module pre-integrated in NF / application that the new configuration needs to be processed, and trigger the application to update the configuration.
TMaaS
Topology Management as a Service constructs a complete 3GPP network function topology, which combines the network function topology, microservice topology as resource pod topology together. This function helps K8S gain NF-level topology. It supports the following features:
- Interfaces with K8S for auto-discovery of services, which will also get resource topology from K8S showing the relationship among deploymet/statefulset, services, pod, and containers.
- Builds 3GPP model, and constructs NF topology of NF, NF microservices and NF microservice instances. It can also combine this NF topology with resource topology.
- Exposes REST and also interfaces with CMaaS to expose Topology data over NetConf.
Figure 4-4 TMaaS Diagram
Figure 4-4 shows the basic working principle of TMaaS. In the figure, TMaaS GW pod directly obtains deployment, service, pod and other information from k8s to form the topology and resource topology of micro service instances. The TMaaS pod predefines the NF topology relationship of the network element and the network element micro service, and this topology relationship will be marked in the deployment / statefulset or other files through annotation. TMaaS GW will sent resource topology to TMaaS. TMaaS will combine the resource topology and predefined NF topology through annotation, and output the merged topology through NBI, which is under definition.
TMaaS uses united management model. In a networked environment, the majority of the network functions (NF), especially carrier class functions, consist of many interrelated components that need to be individually managed by management services (MnS). Management network functions (MnF) are NF which implement MnS.
For each function in the network, there are many different components that need to be discovered, monitored and managed. For the management services to manage a set of network functions, the network functions are represented or modeled as managed objects (MO) which may be stored in a database to monitor state and perform management operations.
All network functions are componentized, virtualized and grouped into CNF. Management services manage functions based on CNF view.
- Each NF is disaggregated into sub-services (micro-services for containerization) of various types.
- One or more network functions can be deployed on Telco PaaS.
- One or more management functions may be deployed on Telco PaaS.
- Telco PaaS instance is deployed on General PaaS/CaaS and runs in its own namespace.
- Each NF instance deployed on Telco PaaS runs in its own namespace.
TMaaS follows a generic ManagedObject model schema for describing the exporting the cluster and NF topology. Topology will be rendered as a hierarchical structure of ManagedObjects.
A ManagedObject models basic properties and relationships. These are updated with actual NF, µService resources (containers, pod, volumes, configuration, etc) properties and relationships upon discovery of the same via K8s APIs.
Managed Object model is based on and in most part derived from 3GPP NRM (Release 16).
Figure 4-5 Management Model followed by NF to deploy on XGVela
FMaaS
Fault Management as a Service is a telecom event management module, which collects events from the system, translate them into NES format, and report to management systems for information and analysis. It has the following features:
- Implements 3GPP compliant fault and alarm management model.
- Provides interfaces for application µService to publish and subscribe to various events.
- Interfaces with the metrics management system (Prometheus) for TCA events.
- Exposed VES compliant NBI for notifications.
Figure 4-6 FMaaS Diagram
Figure 4-6 shows the basic working principle of FMaaS. FMaaS collects events from Prometheus, application pod, XMaaS pod, translate them into VES format. After the translation, the event can be pushed to management systems (like ONAP and VES collector) through VES GW. During this process, FMaaS can also send alert rules to Prometheus.
4.2.3.4 Telco PaaS Solution Proposed by Intel
Network is the most important and complex infrastructure resource in cloud computing. Compared with IT applications, telecom network functions have more complex requirements on network. Telecom network functions generally need multiple network planes, and each network plane is used to carry traffic flow. For example, in telecom Core Cloud, network functions are usually designed with three network planes, which are control network plane, data/user network plane, storage network plane, to separately carry management traffic flow, data plane traffic flow, and storage traffic flow; In telecom Edge Cloud, network functions are at least designed with two network planes, including data/user network plane (used for high-speed user plane data forwarding on edge side) and merged network plane for management and storage (few flow on these two plane so that merged to save resource).
From the perspective of infrastructure, multiple network plane is generally realized by implementing multiple network interfaces/cards to network functions instance, and each network interface/card is assigned to a network plane to forward relative traffic flows. For virtual machine, multiple vNICs can be setup for multiple network planes. For container, it supposed to the same solution as that of virtual machine.
However, typically, in Kubernetes each pod only has one network interface (apart from a loopback) which is not enough for a production ready telco network function. For this problem, there is an open-source solution, Multus, that can solve it. Multus is a container network interface (CNI) plugin for Kubernetes that enables attaching multiple network interfaces to pods. With Multus, you can create a multi-homed pod that has multiple interfaces. Multus CNI has already support many network CNIs, including Calico, Flannel, Userspace CNI (ove-dpdk/vpp), OVN-Multi CNI, SRIOV-NIC CNI and SMartNIC CNI. Developers can specify different network CNI for different network planes to realize container multi network planes. For example, control network plane can chose traditional CNIs with lower performance like Calico, Flannel; while data/user network plane can chose CNIs with better forwarding performance, for example SRIOV-NIC CNI.
According to above contents, we can see that Multus ensures pod to implement multiple virtual network interface/card and implement multiple network plane for containers at infrastructure level. However, for developers of network functions, they still need to understand the working principles and using methods of different CNIs. The difficulty of network development has nor been reduced. Therefore, based on Multus solution, a Telco PaaS function named NMaaS is proposed in this Chapter.
NMaaS, Network Management as a Service, is proposed to expose the service of container multiple network planes. It has the following features:
- Exposes NB APIs for Orchestration/Management systems or developers to configure the Infrastructure NIC, add/delete interface to NF at runtime, SRIOV configuration etc.
- Shields the different using methods of different CNIs and provides consistent user experience on NIC management. For example, developers can simply specify the number of vNICs and SLA requirements on these vNICs to complete container vNIC configuration.
- Supports customized value settings for vNIC parameters such as latency, jitter, bandwidth, throughput, performance, and trigger CaaS layer to setup the required underlaying driver/software based on these requirements.
Figure 4-7 Proposed NMaaS Diagram (need update)
Figure 4-7 shows the basic working principle of NMaaS. NMaaS pod receives a vNIC request from orchestration/management systems or command line through standard NMaaS NBI. NMaaS will convert vNIC request into contents that can be understand by CaaS layer, including vNIC type, amount, configutation, etc. And the converted information will be sent to Multus to setup required underlaying driver/software. Then, NMaaS will trigger K8S create new vNIC for target Pod, and update the traffic and vNIC configuration in Pod.
4.2.4 XGVela PaaS Workflow
After summarizing the technical architecture and functional requirements, XGVela related workflows will be covered in this Chapter. These workflows are applicable to all PaaS platforms, which includes XGVela. In this Chapter, we only cover the simple and general workflows that have no telecom features, while the interaction with NFVO, VNFM and other telecom systems will be considered in future release.
Figure 4-7 PaaS Platform Workflow
We simplify the PaaS technical architecture and its relationship with outer management systems as blocks in figure 4-7, which includes NF/application, User, Operator/Management systems, PaaS Service, PaaS Management, and CaaS. The major workflow are listed below.
Service Order Process
Service Order Process ensure PaaS services to be ordered by Users. Firstly, User will browse all PaaS Services through Service Catalog in PaaS Management and select required PaaS Service. Then, User will configure user-defined parameters of the selected PaaS Service, such as interface, performance parameter, reliability parameter, service model (dedicated or shared), etc.
Service Instantiation Process
Service Instantiation Process instantiates PaaS Service. After ordering required PaaS Service, the User will trigger the service instantiation. The Service Lifecycle Management in PaaS Management will instantiate selected PaaS Service. For shared PaaS Service, instantiation means selecting existing and running PaaS service on PaaS platform, completing configuration including RBAC/access/etc., and return access API to User. For dedicated PaaS Service, instantiation means selecting images/packages/descriptors in Image & Package Repository of PaaS Management, sending these files together with user-defined configurations to CaaS, and triggering K8S to complete PaaS Service instantiation. The CaaS layer will reply access API and monitoring data of PaaS Service to PaaS Management.
Service Unsubscribe Process
Service Unsubscribe Process lets Users to unsubscribe the ordered and instantiated PaaS Service through PaaS Management. For shared PaaS Service, PaaS Management will stop PaaS Service for Users through changing the PaaS Service configuration like RBAC. For dedicated PaaS Service, PaaS Management will delete PaaS Service instance on CaaS.
Service Configuration Updating Process
Service Configuration Updating Process can help User/Operator/ Management Systems to update configuration of PaaS Service through configuration center in PaaS Management. The Service LCM and CaaS will execute new configuration on Service and Pod.
Adding Service Process
Adding Service Process lets Operators to add new PaaS Service to PaaS platform. The process includes uploading images and packages of new PaaS Service into Image & Package Repository, onboarding PaaS Service in Service Catalog, setting the basic configuration (including configuration items and parameters) and user-defined configuration items.
Deleting Service Process
Deleting Service Process lets Operators to offline and remove PaaS Service on PaaS platform. PaaS Management will firstly check the usage condition of PaaS Service selected to be deleted. PaaS Service in use cannot be deleted. If the PaaS Service is not used, PaaS Management will then remove it from Service Catalog, and deleting related images and packages in Image & Package Repository.
Service Operation and Maintenance Process
Service & Resource Monitoring/Log/Event in PaaS Management achieve health check, monitoring, log management, alarm management of PaaS Service through the interfaces with PaaS Service instance.
The O&M interface, data model, data generation, etc. should be pre-developed based on PaaS Management requirements during design and development stage of PaaS Service.