1.0 Introduction – why telecom industry needs Cloud Native PaaS
Since NFV architecture was firstly proposed as ETSI standard in 2012, telecommunication industry and telecom operators have started the cloud transformation of network, moving network functions from dedicated physical devices to standard virtualized environment. By 2021, majority of global telecom operators have built Network Cloud, which is private cloud carrying 4G/5G network functions, value-added network functions, network management systems, network orchestration systems, etc. Operator representatives include AT&T, Verizon, Telstra, BT, DT, China Mobile, China Telecom, China Unicom, NTT, etc. Take China Mobile for example, by the end of 2020, China Mobile has built Network Cloud in 8 major districts within China to carry 37 types of network functions including 5GC, IMS, EPC. The proportion of network cloud is up to 75%, which keeps increasing with the construction of edge cloud.
Network Cloud now follows NFV (Network Function Virtualization) architecture (refers to ETSI GS NFV 002). Network functions are implemented as VNFs (Virtualized Network Function), which use virtual machine as virtualized infrastructure and use OpenStack to manage virtual resources.
With the maturity and popularity of 5G network and edge computing, the customer of telecommunication network expands from individual users to enterprise/industry users. The relatively fixed functions and configurations of core network, which mainly used by individual users, now becomes changeable due to diverse vertical industry use cases. Take 5G core slicing as example, use cases of different industries needs different 5G core network. Network slice used by NB-IOT applications, such as smart metering, requires mainly signaling functions and bandwidth only up to several Kbps. While network slice used by VR and video broadcast requires mainly data transmission capabilities and bandwidth varying from 10Mbps to 100Mbps. To better meet customers’ requirements, Network Cloud needs to improve its agility, flexibility and reliability.
However, existing Network Cloud has the following problems:
- Inflexible network function architecture: VNFs are now developed and delivered in coarse size, which is hard to achieve accurate upgrading, scaling in/out, fault location and etc. And the network functions is not able to support customized design, deployment, configuration to meet diverse requirements from vertical industry.
- Monotonous capability of network cloud: current network cloud provides only infrastructure resources, which is not able to support convenient innovation of network functions/application. Using pure infrastructure is complicated and requires developers and operators obtaining high ability. Besides, Telco’s network cloud involves multiple network functions and infrastructures from multiple vendors, which are all designed in their own form and shape, and these cross-vendor gaps need to be filled by PaaS above infrastructure.
- Complex, costly and slow delivery process: the business processes of existing telecommunication network functions strictly follow the sequence of requirements analysis, design, developing, integration, testing and delivery. Product delivery duration is usually calculated by month. Network requirements of future use cases/ scenarios are changeable. Customer requirements on network may continuously change as their service grow. Traditional development and management processes need to be agile and automatic.
Cloud Native, as the best practice in cloud computing in IT industry, can help network cloud achieve agility, flexibility and reliability. Technologies and concepts such as container, microservice and DevOps can effectively alleviate the above problems. Container, as lightweight, flexible, and commonly used infrastructure can increase agility. Micro service architecture is an effective way in implementing complex software stack. It supports best flexibility in function isolation and evolution. Microservice-type network function supports isolated management, fast customized design/configuration and function combination under different use cases. DevOps can build CI/CD pipeline, and establish continuous feedback during the development and construction process of network cloud, which will improve the end-to-end automation and improve response speed to vertical industry speed. Cloud native is the inevitable direction of network cloud evolution.
Cloud native requires the cloud platform to provide the capabilities required by the applications as much as possible, and the applications use the capabilities provided by the cloud as much as possible so that to grow on the cloud. PaaS, as a bearing platform for required capabilities such as microservice architecture, CI/CD pipeline, network management functionalities and network function reusable modules, is acknowledged as enablement platform of cloud native for network cloud and telecommunication industry.
According to the above analysis, cloud native PaaS, which refers to PaaS providing cloud native capabilities such as containers, microservice, automation tools, and cloud native network function components, is necessary and worth researching in telecommunication industry. Exploring cloud native PaaS in open source community can provide reusable capabilities and reference implementation for the industry.
2.0 PaaS Condition in Network Cloud
Although it can be predicted that cloud native PaaS can improve the flexibility of network cloud, if looking at current situation of telecommunication industry, it is not clear how to introduce PaaS into network cloud, or what PaaS capabilities are required by network. Most of the network clouds, which have been putting into use, follow NFV architecture and do not take PaaS as an independent system or visible object. These operators and vendors use virtual machines and containers directly as infrastructure, and package all required functional modules and software reliance into VNF delivery image. Figure 1 shows a simple diagram of current VNF condition that only resources are reused among VNFs while VNF related software are dedicated to each VNF.
Figure 2-1 VNF Diagram
However, each vendor has its own internal PaaS to support development/maintenance and provide NF (Network Function) required capabilities/modulars; and some operators have designed internal CI/CD pipeline and testing tools. But few of these capabilities are called as PaaS capabilities in network cloud at current stage.
In the field of standardization, ETSI GR NFV-IFA 029 is the representative standard of PaaS for network cloud. It points out the role of PaaS, which includes: 1) eliminating complex operation of NFVI platforms for network function developers and supports auto management/orchestration of NFVI; 2) shielding infrastructure difference of different vendors; 3) providing NFV network service to external customers (e.g. government, bank or enterprises in a vertical industry). ETSI GR NFV-IFA 029 also proposed three potential NFV architectures enhanced with PaaS, but no decision has been made. There is no detailed PaaS capabilities defined in existing standards.
In the field of open source, CNCF provides large amount of cloud native PaaS capabilities that can be used in practical product implementation directly and indirectly (with enhancement). There is no specific indication of which software is necessary or suitable to play as specific PaaS capability in network cloud. As open source can provides reference implementation and is tightly related to practical operation, it is a good start point to explore cloud native PaaS for network cloud.
According to incomplete investigation among industry, standards and open source, possible types of cloud native PaaS capabilities for network cloud are listed as below:
- PaaS capabilities required to implement NF functions: This type of PaaS capability is necessary to achieve NF functions and logics, for example database, load balancer, protocol processing capability (PFCP, GTPU…), message bus, possible NF functional modulars, and some infrastructure related capabilities (such as hardware acceleration). User of this type of capabilities is mainly developers. These capabilities could be common among different NFs but unique for different venders/operators.
- PaaS capabilities required to manage NF functions: This type of PaaS capability helps to optimize the operation and management of network cloud, but has no direct influence on NF logics. Possible capabilities includes observability capabilities, CI/CD tools, testing tools, FCAPS management tools, etc. User of this type of capabilities is mainly developers and operation staffs. These capabilities can also be used to support NFV management systems such as NFVO, VNFM, EMS, OSS.
- PaaS capabilities to expose NF service to external customers: Representative capabilities of this type include bandwidth management capability, user identification capability, mobility management capability, UPF traffic routing function, and other edge computing network functions. User of this type of capabilities is external customers. Currently this type of capabilities is commonly carried and provided by MEC platform.
3.0 XGVela Overview
According to previous chapters, as the PaaS related standards and research are relatively indolent, it is a good way for telecommunication industry to start with existing open-source PaaS capabilities and build reference implementations. We could explore the enhancement of existing PaaS capabilities in telecom scenarios as well as new PaaS capabilities dedicatedly used telco network cloud. This is the reason that we start XGVela project.
XGVela is a telecom cloud native PaaS platform for 5G and future network cloud. It is targeting on delivering common and reusable PaaS capabilities required in the processes of network function development/running, network cloud management/maintenance, network cloud capability exposure and etc., so that applications are lightweight and contain only code to deliver the intended business logic.
XGVela was firstly launched in April 2020. It joint Linux Foundation Networking as Sandbox project in January 2021. China Mobile, Mavenir, Redhat, Huawei, Sgiscale, Intel, ZTE, STC, Ericsson, China Telecom, China Unicom, Nokia, WindRiver forms the first TSC group. XGVela got its first batch of seed code, which delivers telecom management PaaS functions, from Mavenir in December 2020.
3.1 XGVela High-Level Architecture
Figure 3-1 XGVela High-Level Architecture
Figure 3-1 shows the high-level architecture of XGVela.
XGVela, as the cloud native PAAS platform, runs on the container environment by default, and interwork with K8S to realize the orchestration and management of containers. Network Functions (NFs) and applications obtains needed common PaaS capabilities from XGVela. Upper layer telco management systems can select XGVela PaaS capabilities to play as their sub-modules to achieve O&M functions; XGVela PaaS capabilities can also be treated as “platform” resources that support to be orchestrated by management systems.
To distinguish from the existing PaaS implementations in the open-source communities, XGVela divides PaaS capabilities into three categories:
- Category 1: General PaaS
Represented by the blue block in Figure 3-1.
General PaaS can provide the abstract tools, services and environment required in the process of development, deployment, running, operation and management of applications and their associated services. It is the baseline platform for other XGVela PaaS categories.
The user of General PaaS is generally IT applications, which can use such capabilities through standard PaaS service APIs and PaaS management processes. In short, General PaaS refers to IT commonly-used PaaS capabilities (e.g. service mesh, API GW, LB, observability) and existing open-source PaaS implementations (e.g. Istio, envoy, Zookeeper, Grafana). XGVela takes those implementations as reference instead of re-inventing the wheels.
- Category 2: Adaptation Layer
Represented by the yellow block in Figure 3-1.
Adaptation Layer is unique enhancement of General PaaS capability when applied to telecom scenario. To avoid coupling with General PaaS, the enhancement would be implemented in the form of plug-ins, drivers, and other non-invasive forms.
In telecom scenario, the General PaaS capability is usually used in combination with corresponding adaptation layer enhancement points. For example, many load balancer software supports HTTP, TCP, UDP, while for 5G network functions, e.g. UPF, protocols like PFCP, GTPU are used in control plane and data plane flow. If a developer wants to use existing open-source load balancer in 5G core, PFCP and GTPU protocol analysis capability should be enhanced for that LB and delivered together with the LB to provide service.
- Category 3: Telco PaaS
Represented by green block in Figure 3-1.
Telco PaaS focuses on delivering telecom specific PaaS capabilities, which implements telecom features such as multi-tenancy, multiple network plane, network function topology, network function configuration, etc. These capabilities are used to serve telecom network functions and telecom management systems. Some Telco PaaS needs to interwork with General PaaS to deliver complete PaaS service.
The above three categories of PaaS capabilities together constitute XGVela, which provides all "platform" services for cloud native Telecom workloads. Developers can make combinations freely of general PAAS, general PAAS + adaptation and telco PAAS based on requirements.
3.2 XGVela Project Scope
- Define PaaS platform architecture, necessary functions / interfaces, processes, common software, etc. for Telecom scenarios. Content will cover general PaaS, adaptation layer and telco PaaS, of which General PaaS capabilities refers to existing implementations.
- Explore requirements of adaptation layer and telco PaaS base on telecom use cases, and implement functionality, interface, etc.
- Build reference implementation of telecom cloud native PaaS platform.
4.0 XGVela Technical Insight
4.1 Technical Architecture
Figure 4-1 XGVela Technical Architecture
XGVela, as PaaS platform for telecom scenarios, its functional framework is basically consistent with most of the commercial PaaS platforms. As shown in figure 4-1, PaaS platform contains two parts: PaaS Management and PaaS Service.
PaaS Management is responsible for managing PaaS services, which ensures PaaS platform to provide PaaS services to customers. PaaS Management complete the onboarding, orchestration, monitoring and maintenance of PaaS services in the background, make sure that applications and developer/operating stuff can select and use PaaS services, and help to manage operation status of PaaS services. Figure 4-1 shows the most basic functions that PaaS Management should have, which are service catalog, service lifecycle management, Image repository and package repository, API gateway, service & resource monitoring, service & resource log & event. For detailed description and requirements of each function, please refer to chapter 4.2.1
PaaS Service represents the capabilities/functions required by applications, developer and operation stuff, which achieves the core value of PaaS platform. It is managed and orchestrated by PaaS Management. The three types of PaaS capabilities concluded in chapter 2.2 belong to PaaS Service which are also separated as three categories (General PaaS, General PaaS + Adaptation Layer, Telco PaaS) based on using scenarios. The number of PaaS services keeps increasing with the diverse of use cases. And PaaS services keep upgrading based on customers’ needs. Currently, some functions/services have been summarized for different categories. For detailed description and requirements of each service, please refer to chapter 4.2.2 and 4.2.3.
If referring to existing commercial PaaS platform, PaaS Management and General PaaS usually together constitute the PaaS platform of IT industry. Therefore, blue block is used to represent these two together. Corresponding to chapter 3.2, what XGVela is doing is to pick required General PaaS capabilities (blue block) by telco use cases, then find the enhancement point of these General PaaS capabilities as adaptation layer, and explore Telco PaaS capabilities.
4.2 XGVela PaaS Functional Requirements
4.2.1PaaS Management
PaaS Management is responsible for the management of PaaS services. It is required to include at lease the following capabilities:
Service Catalog
Service Catalog is a directory of services, which lists all PaaS services provided by PaaS platform to customers. Specific requirements of service catalog includes:
- It is required that Service Catalog support adding new PaaS service to itself, and deleting, modifying, and querying existing PaaS services. The added PaaS services can exist in multiple forms, which can be local operator or helm package, as well as PaaS services on public cloud that adapted remotely through interfaces.
- It is required that Service Catalog support users/applications/external systems to order PaaS services on demand, and trigger the instantiation and configuration of selected PaaS services.
- It is recommended to achieve Service Catalog with well-designed UI.
Service Lifecycle Management
Service Lifecycle Management (Service LCM) is responsible for managing the lifecycle of PaaS services. Specific requirements include:
- It is required that Service LCM to select images and packages from Image & Package Repository based on user’s selection in Service Catalog, and complete instantiation of PaaS services, which are packaged as Operator or Helm Chart. It also requires Service LCM to start, stop, upgrade, delete PaaS services.
- Service LCM uses Kubernetes interface for resource application and orchestration of PaaS services.
- It is required that Service LCM manages scaling in/out of PaaS service instance based on monitoring KPIs and scaling policy set by customers.
- It is recommended to implement UI for service LCM.
- It is required that Service LCM supports access control, which provide unified user management and authentication mechanism, and support RBAC. Users can obtain management permission for PaaS services through role settings.
Image and Package Repository
Image & Package Repository is responsible for the storage and management of PaaS service images and deployment packages. The images and packages can be stored locally in PaaS platform repository, as well was remotely in cloud repositories (GitHub, docker hub, etc.). When user selects PaaS services in Service Catalog and triggers service instantiation process, Service LCM will ask Image & Package Repository for images and packages.
API Gateway
API Gateway is an PaaS service as well as an important PaaS Management function. When using as PaaS Management function, API Gateway is responsible for exposing the service API of PaaS services, and routing user traffic to target processing unit. Specific requirements include:
- It is required that API Gateway to support identity authentication and authorization of API calls, verifying the legitimacy and authority of API calling entity (user, other software, etc.), releasing secure access control, avoiding security threats on PaaS service.
- It is required that API Gateway to support forwarding user request to correct processing unit. The forwarding process supports load balancing, and traffic control actions based on pre-defined traffic management policies, which may include health check, rate limiting, time out & retry, circuit breaker, etc.
- It is recommended that API Gateway to support protocol processing, including HTTP, HTTP2, GRPC, web socket, etc.
- It is required that API Gateway support managing the service API of PaaS services, including API definition (path, parameters, etc.), APU publishing/ suspending/ online/ offline/ withdraw/ etc. This feature is mainly used by developers/operators to provide API-type PaaS services for external usage.
- It is required to support monitoring API usage and API data analysis, which cover performance, usage, alarm, log, etc.
Service & Resource Monitoring
Service & Resource Monitoring is responsible for monitoring PaaS service instances and resources the service used, collecting and displaying monitoring data through well-designed dashboard, and alarm setting. The monitoring KPI and data are determined by PaaS services themselves. This function assists PaaS platform manager and PaaS service consumer to track the status of PaaS services and resources.
Monitoring contents usually include:
- Service-level: running status and performance of PaaS service instance.
- Resource-level: running status and performance of resource used by PaaS Service.
Service & Resource Log & Event
Service & Resource Log & Event is responsible for the log & event management of PaaS services and resource the services used. Log management includes log collection, log storage, log index, etc. Event management records all the changes of PaaS services and related resources. Event usually carries info of event content (what), event object (who), event time (how), event type, event status, etc. This function helps PaaS platform managers and PaaS service users efficiently find, locate, and solve problems.
To conclude, almost every PaaS platform would acquire the above management functions either in a manual way or in an automatic way. Mature representatives include open source implementation - OpenShift, public cloud - AWS, Azure, Alibaba cloud, Tencent cloud, and many other commercial products.
4.2.2 General PaaS Requirements
As described in Chapter 3.1, General PaaS represents all PaaS Services that have no industry differences. It can serve a variety of applications that have requirements on common PaaS services. It can also be used as the basis for other industrial PaaS services/platforms.
4.2.2.1 General Requirements on General PaaS
Before detailly analyzing the functions that General PaaS should provide, let’s take a look at the general requirements for all General PaaS services:
- General PaaS shall complete service lifecycle management through PaaS Management. There are usually two ways for users to use a PaaS service: call API of API-type PaaS service and create an instance-type PaaS service. API-type PaaS services (like AI service-facial recognition, image processing, etc.) are managed by API Gateway of PaaS Management. Instance-type PaaS Services (like DB/ LB, etc.) are managed by Service Lifecycle Management.
- General PaaS services shall be packaged as Operator or Helm Chart. Related image and package can be stored in local Image & Package Repository, as well as in remote repository while pre-configuring automatic access to the remote repo.
- All General PaaS service shall support to general monitoring data, logs, events, alarms, etc., and report to Service & Resource Monitoring function and Service & Resource Log & Event function in PaaS Management.
- General PaaS services shall support to be deployed on any Kubernetes cased CaaS (Container as a Service) layer.
- General PaaS shall support custom configuration. The configuration parameters are designed by developers, the configuration contents are provided by users according to application requirements following certain rules.
- It is recommended to select General PaaS software from commonly used CNCF projects.
4.2.2.2 Functional Requirements on General PaaS
The number and type of General PaaS services will keep increasing as more and more use cases and user requirements have been explored. In June 2020, XGVela and Anuket made a joint survey on commonly used General PaaS functions and software. Here we’ll provide requirements on mostly used General PaaS functions based on the statistical results.
Data Store/Databases
General PaaS shall provide mainstream relational databases and non-relational databases. The database shall support user management, configuration management and monitoring of itself. Detailed requirements are listed below.
- Data Store/ Database type & software:
Relational DB | Include but not limited to MySQL, MariaDB, PostgreSQL. |
Non-relational DB | Include but not limited to MangoDB, Radis, etcd. |
Mainstream commercial DB | Include but not limited to Oracle, SQLServer. |
Distributed data storage | Include but not limited to Ceph. |
- It is recommended database achieve the following monitoring KPIs: number of queries, response time, number of errors, throughput, number of query concurrency, number of tables.
- General PaaS shall support backup of data storage and database manually and automatically. The backup needs to support be downloaded and be used for data recovery.
Streaming & Messaging
General PaaS shall provide Streaming & Messaging functions and software, which can provide services through API or instantiation. Detailed requirements are listed below.
- Commonly used Streaming & Messaging software includes but not limited to AMQP, Apache Kafka, Apache Spark, Rabbit MQ, NATS.
- Streaming & Messaging functions provided by General PaaS should support message caching, transmission, replication, distribution, encryption and compression.
- It is recommended that Streaming & Messaging software achieve the following KPIs: total number of messages, message publish rate, message delivery rate, etc.
Service Proxy & Load Balancing
General PaaS shall provide Service Proxy & Load Balancing functions and software. The service can be provided through container instance or service APIs. The Service Proxy & Load Balancing functions should support customized configuration management, monitoring and operation. This function can be used to manage east-west traffic within a K8S cluster, as well as north-south traffic across multiple K8S clusters. Detailed requirements are listed below.
- This functions should dynamically update the status of service backend, including updating service IP address, traffic management like flow forwarding/ flow control/ ACL, etc.
- Commonly used Service Proxy & Load Balancing software includes but not limited to Istio, Contour, MetalLB, NGINX, Envoy, Linkerd.
- It is recommended that this function support statistic and analysis of flows, including number of messages (request/response/error/etc.). the number of sent/received data packets, delay, etc.
Observability
General PaaS shall provide tools to help developers, operators, and users to realize observability of applications running on PaaS platform, which is to obtain metrics/logs/alarms/link status of applications, make these data visualized, so that different group of people can get insight of application running status, locate fault and analyze problems. Observability can help to maintain the stability and reliability of the application. Observability usually includes monitoring, logging, and tracing.
The life cycle of application software and system can be roughly divided into two stages: developing stage and running stage. In developing stage, application software designer/developer should implement the generation of metrics, logs, events, and design the output of these contents according to requirements of General PaaS tools (which includes data format mapping, General PaaS agent integration, etc.). In the running stage, application software will generate metrics, logs, events. These data can be collected by agent and reported to matched General PaaS tool, or General PaaS tool can directly pull data from agent. Data generation and collection at running stage are automatically completed by General PaaS tool.
Detailed requirements are listed below:
- General PaaS shall provide monitoring software:
- Monitoring software should support metric collection with support to federate and stream metrics across clusters.
- The collected metrics should support visualization by dashboard.
- It is recommended that monitoring software can support customized alarm and aggregation rules configuration, and alert subscription.
- Commonly used monitoring software include Promehteus, Cortex, Thanos, Grafana, Kiali, Zabbix, and Collectd, within which Promehteus and Grafana are most popular. The above software can be used independently or combinedly.
- General PaaS shall provide software for log management, which includes collecting, storing, displaying and other operations of software and system logs.
- Commonly used logging software include Fluentd, ElasticSearch, Logstash, FluentBit, ELK, among which ElasticSearch, Fluentd and ELK are mostly used.
- The collected logs should be visualized.
- Besides collecting, storing, indexing, displaying logs, the log management software can also support log analysis, alarm and alarm subscription.
- General PAAS shall provide tracing functions to record all operations in the whole service life cycle, so that it can provide information for problem analysis and O&M.
- Commonly used tracing software include Jaeger, OpenTracing, OpenCensus, OpenTelemetry, ZIPkin, among which Jaeger, OpenTelemetry, and Open Tracing are most popular.
DevOps
The lifecycle of software generally includes requirement determination, user experience design, development, testing, deployment, continuous O&M (operation and management). The object of DevOps is to connect these six steps into an automatic workflow, so that developers can only focus on coding and continuously get operation feedbacks, which can finally shorten the product delivery cycle, as well as improve delivery quality.
General PaaS shall provide DevOps tools to help enterprise to build automatic workflow and implement DevOps concepts, which includes:
- Continuous integration and continuous delivery tools, which helps to build automatic pipeline of integration, testing, deployment, and upgrade.
- Project management tools, which helps developers to create independent workspace to make operations including code change, construction, automatic testing, integration, release, etc.
- Code management tools, which provides code repository and code quality management functions. These tools should support maintaining detailed application code changing records, and authorization management of code branch.
- Automatic testing tools, which support automatically execute test cases according to user-defined test contents and generate visualized testing results.
Commonly used open-source software include Jenkins, Gitlab, Maven, Argo, Sonar.
Service Mesh
General PaaS shall provide Service Mesh functions and software. Detailed requirements are listed below.
- General PaaS service mesh shall support traffic management (TCP proxying, load balancing, traffic split, mirroring, cricuit breaker, fault injection, filters, external routing, ingress, etc.) Security (mTLS, certificate rotation), proxy injection, CNI plugins. multi cluster support.
- General PaaS service mesh shall support dashboard for visualizing the mesh, various communications, link load conditions, etc.
- Commonly used open-source tools are Linkerd, Consul, Istio, Envoy, among which Istio+Envoy are popular choice.
API Gateway
API Gateway described in this section is General PaaS service, which is mainly responsible for exposing service API of user-developed applications/systems. Requirements of it are basically the same as that of API Gateway as PaaS Management functions.
Commonly used open-source tools are Kong, Tyk, 3-Scale, Istio, EMCO.
Service Discovery
General PAAS shall provide Service Discovery functions and software to help micro services of application/system to obtain each other's access information
- Service Discovery functions shall maintain real-time microservice access info, which includes adding address of new microservice, update microservice instance address, deleting information of fault microservice, etc.
- Commonly used open -source software include CoreDNS,etcd,Zookeeper,Netflix,Nacos, among which CoreDNS, etcd, Zookeeper are popular choice.
4.2.3 Telco PaaS Requirements
4.2.4 XGVela PaaS Interface and Workflow